diff --git a/flang/docs/HighLevelFIR.md b/flang/docs/HighLevelFIR.md new file mode 100644 --- /dev/null +++ b/flang/docs/HighLevelFIR.md @@ -0,0 +1,1410 @@ +The approach of FIR and lowering design so far was to start with the minimal set +of IR operations that could allow implementing the core aspects of Fortran (like +memory allocations, array addressing, runtime descriptors, and structured +control flow operations). One notable aspect of the current FIR is that array +and character operations are buffered (some storage is allocated for the result, +and the storage is addressed to implement the operation). While this proved +functional so far, the code lowering expressions and assignments from the +front-end representations (the evaluate::Expr and parser nodes) to FIR has +significantly grown in complexity while it still lacks some F95 features around +character array expressions or FORALL. This is mainly explained by the fact that +the representation level gap is big, and a lot is happening in lowering. It +appears more and more that some intermediate steps would help to split concerns +between translating the front-end representation to MLIR, implementing some +Fortran concepts at a lower-level (like character or derived type assignments), +and how bufferizations of character and array expressions should be done. + +This document proposes the addition of two concepts and a set of related +operations in a new dialect HLFIR to allow a simpler lowering to a higher-level +FIR representation that would later be lowered to the current FIR representation +via MLIR translation passes. As a result of these additions, it is likely that +the fir.array_load/fir.array_merge_store and related array operations could be +removed from FIR since array assignment analysis could directly happen on the +higher-level FIR representation. + + +The main principles of the new lowering design are: +- Make expression lowering context independent and rather naive +- Do not materialize temporaries while lowering to FIR +- Preserve Fortran semantics/information for high-level optimizations + +The core impact on lowering will be: +- Lowering expressions and assignments in the exact same way, regardless of + whether it is an array assignment context and/or an expression inside a + forall. +- Lowering transformational intrinsics in a verbatim way (no runtime calls and + memory aspects yet). +- Lowering character expressions in a verbatim way (no memcpy/runtime calls + and memory aspects yet). +- Argument association side effects will be delayed (copy-in/copy-out) to help + inlining/function specialization to get rid of them when they are not + relevant. + + +## Variable and Expression value concepts in HLFIR + +## Strengthening the variable concept + +Fortran variables are currently represented in FIR as mlir::Value with reference +or box type coming from special operations or block arguments. They are either +the result of a fir.alloca, fir.allocmem, or fir.address_of operations with the +mangled name of the variable as attribute, or they are function block arguments +with the mangled name of the variable as attribute. + +Fortran variables are defined with a Fortran type (both dynamic and static) that +may have type parameters, a rank and shape (including lower bounds), and some +attributes (like TARGET, OPTIONAL, VOLATILE...). All this information is +currently not represented in FIR. Instead, lowering keeps track of all this +information in the fir::ExtendedValue lowering data structure and uses it when +needed. If unused in lowering, some information about variables is lost (like +non-constant array bound expressions). In the IR, only the static type, the +compile time constant extents, and compile time character lengths can be +retrieved from the mlir::Value of a variable in the general case (more can be +retrieved if the variable is tracked via a fir.box, but not if it is a bare +memory reference). + +This makes reasoning about Fortran variables in FIR harder, and in general +forces lowering to apply all decisions related to the information that is lost +in FIR. A more problematic point is that it does not allow generating debug +information for the variables from FIR, since the bounds and type parameters +information is not tightly linked to the base mlir::Value. + +The proposal is to add a fir.declare operation that would anchor the +fir::ExtendedValue information in the IR regardless of the mlir::Value used for +the variable (bare memory reference, or fir.box). This operation will have a +"fir.def = uniq_mangled_variable_name" that will allow linking it to the Fortran +source variable, and will take all the bounds and type parameters as operands. +All the high-level operations referring to variables will have a "fir.ref = +uniq_mangled_variable_name" that will allow retrieving back the related +dominating fir.declare and all the variable information. In most of the cases, +the fir.declare should simply be the defining operation of the operand mlir +value. + +The fir.declare operation will allow: +- Pushing higher-level Fortran concepts into FIR operations (like array + assignments or transformational intrinsics). +- Generating debug information for the variables based on the fir.declare + operation. +- Generic Fortran aliasing analysis (currently implemented only around array + assignments with the fir.array_load concept). + +The fir.declare op is the only operation described by this change that will be +added to FIR and not HLFIR. The rational for this is that it is intended to +survive until LLVM dialect codegeneration so that debug info generation can use +them and alias information can take advantage of them even on FIR. + +Note that Fortran variables are not necessarily named objects, they can also be +the result of function references returning POINTERs. fir.declare will also +accept such variables to be described in the IR (a unique name will be built +from the caller scope name and the function name.). In general, fir.declare +will allow to view every memory storage as a variable, and this will be used to +describe and use compiler created array temporaries. + +## Adding an expression value concept in HLFIR + +Currently, Fortran expressions can be represented as SSA values for scalar +logical, integer, real, and complex expressions. Scalar character or +derived-type expressions and all array expressions are buffered in lowering: +their results are directly given a memory storage in lowering and are +manipulated as variables. + +While this keeps FIR simple, this makes the amount of IR generated for these +expressions higher, and in general makes later optimization passes job harder +since they present non-trivial patterns (with memory operations) and cannot be +eliminated by naive dead code elimination when the result is unused. This also +forces lowering to combine elemental array expressions into single loop nests to +avoid bufferizing all array sub-expressions (which would yield terrible +performance). These combinations, which are implemented using C++ lambdas in +lowering makes lowering code harder to understand. It also makes the expression +lowering code context dependent (especially designators lowering). The lowering +code paths may be different when lowering a syntactically similar expression in +an elemental expression context, in a forall context, or in a normal context. + +Some of the combinations described in [Array Composition](ArrayComposition.md) +are currently not implemented in lowering because they are less trivial +optimizations, and do not really belong in lowering. However, deploying such +combinations on the generated FIR with bufferizations requires the usage of +non-trivial pattern matching and rewrites (recognizing temporary allocation, +usage, and related runtime calls). Note that the goal of such combination is not +only about inlining transformational runtime calls, it is mainly about never +generating a temporary for an array expression sub-operand that is a +transformational intrinsic call matching certain criteria. So the optimization +pass will not only need to recognize the intrinsic call, it must understand the +context it is being called in. + +The usage of memory manipulations also makes some of the alias analysis more +complex, especially when dealing with foralls (the alias analysis cannot simply +follow an operand tree, it must understand indirect dependencies from operations +stored in memory). + +The proposal is to add a !hlfir.expr SSA value type concept, and set of +character operations (concatenation, TRIM, MAX, MIN, comparisons...), a set of +array transformational operations (SUM, MATMUL, TRANSPOSE, ...), and a generic +hlfir.elemental operation. The hlfir.expr type is not intended to be used +with scalar types that already have SSA value types (e.g., integer or real +scalars). Instead, these existing SSA types will implicitly be considered as +being expressions when used in high-level FIR operations, which will simplify +interfacing with other dialects that define operations with these types (e.g., +the arith dialect). + +These hlfir.expr values could then be placed in memory when needed (assigned to +a variable, passed as a procedure argument, or an IO output item...) via +hlfir.assign or hlfir.associate operations that will later be described. + +When no special optimization pass is run, a translation pass would lower the +operations producing hlfir.expr to buffer allocations and memory operations just +as in the currently generated FIR. + +However, these high-level operations should allow the writing of optimization +passes combining chains of operations producing hlfir.expr into optimized forms +via pattern matching on the operand tree. + +The hlfir.elemental operation will be discussed in more detail below. It allows +simplifying lowering while keeping the ability to combine elemental +sub-expressions into a single loop nest. It should also allow rewriting some of +the transformational intrinsic operations to functions of the indices as +described in [Array Composition](ArrayComposition.md). + +## Proposed design for HLFIR (High-Level Fortran IR) + +### HLFIR Operations and Types + +#### Introduce a hlfir.expr type + +Motivation: avoid the need to materialize expressions in temporaries while +lowering. + +Syntax: ``` !hlfir.expr<[extent x]* T [, class]> ``` + +- `[extent x]*` represents the shape for arrays similarly to !fir.array<> type, + except that the shape cannot be assumed rank (!hlfir.expr<..xT> is invalid). + This restriction can be added because it is impossible to create an assumed + rank expression in Fortran that is not a variable. +- `T` is the element type of the static type +- `class` flag can be set to denote that this a polymorphic expression (that the + dynamic type should not be assumed to be the static type). + + +examples: !hlfir.expr>, !hlfir.expr<10xi32>, +!hlfir.expr> + +T in scalar hlfir.expr can be: +- A character type (fir.char<10, kind>, fir.char) +- A derived type: (fir.type) + +T in an array hlfir.expr< e1 x ex2 .. : T> can be: +- A character or derived type +- A logical type (fir.logical) +- An integer type (i1, i32, ….) +- A floating point type (f32, f16…) +- A complex type (fir.complex<4> or mlir::complex...) + +Some expressions may be polymorphic (for instance, MERGE can be used on +polymorphic entities). The hlfir.expr type has an optional "class" flag to +denote this: hlfir.expr. + +Note that the ALLOCATABLE, POINTER, TARGET, VOLATILE, ASYNCHRONOUS, OPTIONAL +aspects do not apply to expressions, they apply to variables. + +It is possible to query the following about an expression: +- What is the extent : via hlfir.get_extent %expr, dim +- What are the length parameters: via hlfir.get_typeparam %expr [, param_name] +- What is the dynamic type: via hlfir.get_dynamic_type %expr + +It is possible to get the value of an array expression element: +- %element = hlfir.apply %expr, %i, %j : (!hlfir.expr, index index) -> + hlfir.expr | AnyConstantSizeScalarType + +It is not directly possible to take an address for the expression, but an +expression value can be associated to a new variable whose address can be used +(required when passing the expression in a user call, or to concepts that are +kept low level in FIR, like IO runtime calls). The variable created may be a +compiler created temporary, or may relate to a Fortran source variable if this +mechanism is used to implement ASSOCIATE. + +- %var = hlfir.associate %expr [attributes about the association]-> + AnyMemoryOrBoxType +- hlfir.end_association %var + +The intention is that the hlfir.expr is the result of an operation, and +should most often not be a block argument. This is because the hlfir.expr is +mostly intended to allow combining chains of operations into more optimal +forms. But it is possible to represent any expression result via a Fortran +runtime descriptor (fir.box), implying that if a hlfir.expr is passed as +a block argument, the expression bufferization pass will evaluate the operation +producing the expression in a temporary, and transform the block operand into a +fir.box describing the temporary. Clean-up for the temporary will be inserted +after the last use of the hlfir.expr. Note that, at least at first, lowering +may help FIR to find the last use of a hlfir.expr by explicitly inserting a +hlfir.finalize %expr operation that may turn into a no-op if the expression is +not later materialized in memory. + +It is nonetheless not intended that such abstract types be used as block +arguments to avoid introducing allocations and descriptor manipulations. + +#### fir.declare operation + +Motivation: represent variables, linking together a memory storage, shape, +length parameters, attributes and the variable name. + +Syntax: +``` +%var = fir.declare %base [shape %extent1, %extent2, ...] [lbs %lb1, %lb2, ...] [typeparams %l1, ...] {fir.def = mangled_variable_name, attributes} : [(....) ->] T +``` + +%var will have the same type as %base. When no debug info is generated, the +operation can be replaced by %base when lowering to LLVM. + +- Extents should only be provided if %base is not a fir.box and the entity is an + array. +- lower bounds should only be provided if the entity is an array and the lower + bounds are not default (all ones). It should also not be provided for POINTERs + and ALLOCATABLES since the lower bounds may change. +- type parameters should be provided for entities with length parameters, unless + the entity is a CHARACTER where the length is constant in %base type. +- The attributes will include the Fortran attributes: TARGET (fir.target), + POINTER (fir.ptr), ALLOCATABLE (fir.alloc), CONTIGUOUS (fir.contiguous), + OPTIONAL (fir.optional), VOLATILE (fir.volatile), ASYNCHRONOUS (fir.async). + They will also indicate when an entity is part of an equivalence by giving the + equivalence name (fir.equiv = mangled_equivalence_name). + +fir.declare will be used for all Fortran variables, except the ones created via +the ASSOCIATE construct that will use hlfir.associate described below. + +fir.declare will also be used when creating compiler created temporaries, in +which case the fir.tmp attribute will be given. + +Examples: + +| FORTRAN | FIR | +| ----------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| REAL :: X | %mem = fir.alloca f32
%x = fir.declare %mem {fir.def = "\_QPfooEx"} : fir.ref | +| REAL, TARGET :: X(10) | %mem = fir.alloca f32
%nval = fir.load %n
%x = fir.declare %mem {fir.def = "\_QPfooEx", fir.target} : fir.ref> | +| REAL :: X(N) | %mem = // … alloc or dummy argument
%nval = fir.load %n : i64
%x = fir.declare %mem shape %nval {fir.def = "\_QPfooEx"} : (i64) -> fir.ref> | +| REAL :: X(0:) | %mem = // … dummy argument
%c0 = arith.constant 0 : index
%x = fir.declare %mem lbs %c0 {fir.def = "\_QPfooEx"} : (index) -> fir.box> | +|
REAL, POINTER :: X(:) | %mem = // … dummy argument, or local, or global
%x = fir.declare %mem {fir.def = "\_QPfooEx", fir.ptr} : fir.ref>>> | +| REAL, ALLOCATABLE :: X(:) | %mem = // … dummy argument, or local, or global
%x = fir.declare %mem {fir.def = "\_QPfooEx", fir.alloc} : fir.ref>>> | +| CHARACTER(10) :: C | %mem = // … dummy argument, or local, or global
%c = fir.declare %mem lbs %c0 {fir.def = "\_QPfooEc"} : fir.ref> | +| CHARACTER(\*) :: C | %unbox = fir.unbox %bochar (fir.boxchar<1>) -> (fir.ref>, index)
%c = fir.declare %unbox#0 typeparams %unbox#1 {fir.def = "\_QPfooEc"} : (index) -> fir.ref> | +| CHARACTER(\*), OPTIONAL, ALLOCATABLE :: C | %mem = // … dummy argument
%c = fir.declare %mem {fir.def = "\_QPfooEc", fir.alloc, fir.optional, fir.assumed\_len\_alloc} : fir.ref>>> | +| TYPE(T) :: X | %mem = // … dummy argument, or local, or global
%x = fir.declare %mem {fir.def = "\_QPfooEx"} : fir.ref> | +| TYPE(T(L)) :: X | %mem = // … dummy argument, or local, or global
%lval = fir.load %l
%x = fir.declare %mem typeparams %lval {fir.def = "\_QPfooEx"} : fir.box> | +| CLASS(\*), POINTER :: X | %mem = // … dummy argument, or local, or global
%x = fir.declare %mem {fir.def = "\_QPfooEx", fir.ptr} : fir.class> | +| REAL :: X(..) | %mem = // … dummy argument
%x = fir.declare %mem {fir.def = "\_QPfooEx"} : fir.box> | + +#### hlfir.associate operation + +Motivation: represent Fortran associations (both from variables and expressions) +and allow keeping actual/dummy argument association information after inlining. + +Syntax: +``` +%var = hlfir.associate %expr_or_var {fir.def = mangled_uniq_name, attributes} (AnyExprOrVarType) -> AnyVarType +``` + +hlfir.associate is used to represent the following associations: +- Dummy/Actual association on the caller side (the callee side uses + fir.declare). +- Host association in block constructs when VOLATILE/ASYNC attributes are added + locally +- ASSOCIATE construct (both from variable and expressions). + +When the operand is a variable, hlfir.associate allows changing the attributes +of the variable locally, and to encode certain side-effects (like +copy-in/copy-out when going from a non-contiguous variable to a contiguous +variable, with the help of the related hlfir.end_association operation). + +When the operand is an expression, hlfir.associate allows associating a storage +location to an expression value. + +A hlfir.associate must be followed by a related hlfir.end_association that will +allow inserting any necessary finalization or copy-out later. + +#### hlfir.end_association operation + +Motivation: mark the place where some association should end and some side +effects might need to occur. + +The hlfir.end_associate is a placeholder to later insert +deallocation/finalization if the variable was associated with an expression, +and to insert copy-out/deallocation if the variable was associated with another +variable with a copy-in. + +Syntax: +``` +hlfir.end_association %var [%original_variable] {fir.ref = var_mangled_name, attributes} +``` + + +The attributes can be: +- copy_out (copy out the associated variable back into the original variable + if a copy-in occurred) +- finalize_copy_in (deallocate the temporary storage for the associated + variable if a copy-in occurred but the associated variable was not modified + (e.g., it is intent(in))). +- finalize: indicate that a finalizer should be run on the entity associated + with the variable (There is currently no way to deduce this only from the + variable type in FIR). It will give the finalizer mangled name so that it + can be later called. + +If the copy_out or finalize_copy_in attribute is set, “original_variable” (the +argument of the hlfir.associate that produced %var) must be provided. The +rationale is that the original variable address is needed to verify if a +temporary was created, and if needed, to copy the data back to it. + +#### hlfir.finalize + +Motivation: mark end of life of local variables + +Mark the place where a local variable will go out of scope. The main goal is to +retain this information even after local variables are inlined. + +Syntax: +``` +hlfir.finalize %var {fir.ref = var_mangled_name, attributes} +``` + +The attributes can be: +- finalize: indicate that a finalizer should be run on the entity associated + with the variable (There is currently no way to deduce this only from the + variable type in FIR). + +Note that finalization will not free the local variable storage if it was +allocated on the heap. If lowering created the storage passed to fir.declare via +a fir.allocmem, lowering should insert a fir.freemem after the hlfir.finalize. +This could help making fir.allocmem to fir.alloca promotion simpler, and also +because finalization may be run without the intent to deallocate the variable +storage (like on INTENT(OUT) dummies). + + +#### hlfir.designate + +Motivation: Represent designators at a high-level and allow representing some +information about derived type components that would otherwise be lost, like +component lower bounds. + +Represent Fortran designators in a verbatim way: both triplet, and component +parts. + +Syntax: +``` +%var = hlfir.designate %base [“component”,] [(%i, %k:l%:%m)] [substr ub, lb] [imag|real] [shape extent1, extent2, ....] [lbs lb1, lb2, .....] [typeparams %l1, ...] {fir.ref = base_mangled_name, fir.def = mangled_name, attributes} +``` + +hlfir.designate is intended to encode a single part-ref (as defined by the +fortran standard). That means that a(:)%x(i, j, k) must be split into two +hlfir.designate: one for a(:), and one for x(i, j, k). If the base is ranked, +and the component is an array, the subscripts are mandatory and must not +contain triplets. This ensures that the result of a fir.designator cannot be a +"super-array". + +The subscripts passed to hlfir.designate must be based on the base lower bounds +(one by default). + +A substring is built by providing the lower and upper character indices after +`substr`. Implicit substring bounds must be made explicit by lowering. It is +not possible to provide substr if a component is already provided. Instead the +related Fortran designator must be split into two fir.designator. This is +because the component character length will be needed to compute the right +stride, and it might be lost if not placed on the first designator typeparams. + +Real and Imaginary complex parts are represented by an optional imag or real +tag. It can be added even if there is already a component. + +The shape, lower bound, and type parameter operands represent the output entity +properties. The point of having those made explicit is to allow early folding +and hoisting of array section shape and length parameters (which especially in +FORALL contexts, can simplify later assignment temporary insertion a lot). Also, +if lower bounds of a derived type component array could not be added here, they +would be lost since they are not represented by other means in FIR (the fir.type +does not include this information). + +hlfir.designate is not intended to describe vector subscripted variables. +Instead, lowering will have to introduce loops to do element by element +addressing. See the Examples section. This helps keeping hlfir.designate simple, +and since the contexts where a vector subscripted entity is considered to be a +variable (in the sense that it can be modified) are very limited, it seems +reasonable to have lowering deal with this aspect. For instance, a vector +subscripted entity cannot be passed as a variable, it cannot be a pointer +assignment target, and when it appears as an associated entity in an ASSOCIATE, +the related variable cannot be modified. + +#### hlfir.assign + +Motivation: represent assignment at a high-level (mainly a change for array and +character assignment) so that optimization pass can clearly reason about it +(value propagation, inserting temporary for right-hand side evaluation only when +needed), and that lowering does not have to implement it all. + +Syntax: +``` +hlfir.assign %expr_or_var to %var [attributes] +``` + +The attributes can be: + +- realloc: mark that assignment has F2003 semantics and that the left-hand + side may have to be deallocated/reallocated… +- use_assign=@function: mark a user defined assignment +- no_overlap: mark that an assignment does not need a temporary (added by an + analysis pass). +- unordered : mark that an assignment can happen in any element order (not + true if there is an impure elemental function being called). + +This will replace the current array_load/array_access/array_merge semantics. +Instead, a more generic alias analysis will be performed on the LHS and RHS to +detect aliasing, and a temporary inserted if needed. The alias analysis will +look at all the memory references in the RHS operand tree and base overlap +decisions on the related variable declaration operations. This same analysis +should later allow moving/merging some expression evaluation between different +statements. + +Note about user defined assignments: semantics is resolving them and building +the related subroutine call. So a fir.call could directly be made in lowering if +the right hand side was always evaluated in a temporary. The motivation to use +hlfir.assign is to help the temporary removal, and also to deal with two edge +cases: user assignment in a FORALL (the forall pass will need to understand that +this an assignment), and allocatable assignment mixed with user assignment +(implementing this as a call in lowering would require lowering the whole +reallocation logic in lowering already, duplicating the fact that hlfir.assign +should deal with it). + +#### hlfir.ptr_assign + +Motivation: represent pointer assignment without lowering the exact pointer +implementation (descriptor address, fir.ref or simple pointer scalar +fir.llvm_ptr). + +Syntax: +``` +hlfir.ptr_assign %var [[reshape %reshape] | [lbounds %lb1, …., %lbn]] to %ptr +``` + +It is important to keep pointer assignment at a high-level so that they can +later correctly be processed in hlfir.forall. + +#### hlfir.allocate + +Motivation: keep POINTER and ALLOCATABLE allocation explicit in HLFIR, while +allowing later lowering to either inlined fir.allocmem or Fortran runtime +calls. Generating runtime calls allow the runtime to do Fortran specific +bookkeeping or flagging and to provide better runtime error reports. + +The main difference with the ALLOCATE statement is that one distinct +hlfir.allocate has to be created for each element of the allocation-list. +Otherwise, it is a naive lowering of the ALLOCATE statement. + +Syntax: +``` +%stat = hlfir.allocate %var [%shape] [%type_params] [[src=%source] | [mold=%mold]] [errmsg =%errmsg] +``` + +#### hlfir.deallocate + +Motivation: keep deallocation explicit in HLFIR, while allowing later lowering +to Fortran runtime calls to allow the runtime to do Fortran specific +bookkeeping or flagging of allocations. + +Similarly to hlfir.allocate, one operation must be created for each +allocate-object-list object. + +Syntax: +``` +%stat = hlfir.deallocate %var [errmsg=err]. +``` + +#### hlfir.elemental + +Motivation: represent elemental operations without defining array level +operations for each of them, and allow the representation of array expressions +as function of the indices. + +The hlfir.elemental operation can be seen as a closure: it is defining a +function of the indices that returns the value of the element of the +represented array expression at the given indices. This an operation with an +MLIR region. It allows detailing how an elemental expression is implemented at +the element level, without yet requiring materializing the operands and result +in memory. The hlfir.expr elements value can be obtained using hlfir.apply. + +The element result is built with a fir.result op, whose result type can be a +scalar hlfir.expr or any scalar constant size types (e.g. i32, or f32). + +Syntax: +``` +%op = hlfir.elemental (%indices) %shape [%type_params] [%dynamic_type] { + …. + fir.result %result_element +} +``` + + +Note that %indices are not operands, they are the elemental region block +arguments, representing the array iteration space in a one based fashion. +The choice of using one based indicies is to match Fortran default for +array variables, so that there is no need to generate bound adjustments +when working with one based array variables in an expression. + +Illustration: “A + B” represented with a hlfir.elemental. + +``` +%add = hlfir.elemental (%i:index, %j:index) shape %shape (!fir.shape<2>) -> !hlfir.expr { + %belt = hlfir.designate %b, %i, %j {fir.ref = _QPfooEb, fir.def = _QPfooEb.des001}: (!fir.ref>, index, index) -> !fir.ref + %celt = hlfir.designate %c, %i, %j {fir.ref = _QPfooEa, fir.def = _QPfooEa.des002} : (!fir.ref>, index, index) -> !fir.ref + %bval = fir.load %belt : (!fir.ref) -> f32 + %cval = fir.load %celt : (!fir.ref) -> f32 + %add = arith.addf %bval, %cval : f32 + fir.result %res : f32 +} +``` + +In contexts where it can be proved that the array operands were not modified +between the hlfir.elemental and the hlfir.apply, the region of the +hlfir.elemental can be inlined at the hlfir.apply. Otherwise, if there is no +such guarantee, or if the hlfir.elemental is not “visible” (because its result +is passed as a block argument), the hlfir.elemental will be lowered to an array +temporary. This will be done as a HLFIR to HLFIR optimization pass. Note that +MLIR inlining could be used if hlfir.elemental implemented the +CallableInterface and hlfir.apply the CallInterface. But MLIR generic inlining +is probably too generic for this case: no recursion is possible here, the call +graphs are trivial, and using MLIR inlining here could introduce later +conflicts or make normal function inlining more complex because FIR inlining +hooks would already be used. + +hlfir.elemental allows delaying elemental array expression buffering and +combination. Its generic aspect has two advantages: +- It avoids defining one operation per elemental operation or intrinsic, + instead, the related arith dialect operations can be used directly in the + elemental regions. This avoids growing HLFIR and having to maintain about a + hundred operations. +- It allows representing transformational intrinsics as functions of the indices + while doing optimization as described in + [Array Composition](ArrayComposition.md). This because the indices can be + transformed inside the region before being applied to array variables + according to any kind of transformation (semi-affine or not). + + +#### Introducing the hlfir.apply operation + +Motivation: provide a way to get the element of an array expression +(hlfir.expr) + +This is the addressing equivalent for expressions. A notable difference is that +it can only take simple scalar indices (no triplets) because it is not clear +why supporting triplets would be needed, and keeping the indexing simple makes +inlining of hlfir.elemental much easier. + +If hlfir.elemental inlining is not performed, or if the hlfir.expr array +expression is produced by another operation (like fir.intrinsic) that is not +rewritten, hlfir.apply will be lowered to an actual addressing operation that +will address the temporary that was created for the hlfir.expr value that +was materialized in memory. + +hlfir.apply indices will be one based to make further lowering simpler. + +Syntax: +``` +%element = hlfir.apply %array_expr %i, %j: (hlfir.expr) -> i32 +``` + + +#### Introducing operations for transformational intrinsic functions + +Motivation: Represent transformational intrinsics functions at a high-level so +that they can be manipulated easily by the optimizer, and do not require +materializing the result as a temporary in lowering. + +An operation will be added for each Fortran transformational functions (SUM, +MATMUL, TRANSPOSE....). It translates the Fortran expression verbatim: it takes +the same number of arguments as the Fortran intrinsics and returns a +hlfir.expr. The arguments may be hlfir.expr, simple scalar types (e.g., +i32, f32), or variables. + +The exception being that the arguments that are statically absent would be +passed to it (passing results of fir.absent operation), so that the arguments +can be identified via their positions. + +This operation is meant for the transformational intrinsics, not the elemental +intrinsics, that will be implemented using hlfir.elemental + mlir math dialect +operations, nor the intrinsic subroutines (like random_seed or system_clock), +that will be directly lowered in lowering. + +Syntax: +``` +%res = hlfir."intrinsic_name" %expr_or_var, ... +``` + +These operations will all inherit a same operation base in tablegen to make +their definition and identification easy. + +Without any optimization, codegen would then translate the operations to +exactly the same FIR as currently generated by IntrinsicCall.cpp (runtime calls +or inlined code with temporary allocation for array results). The fact that +they are the verbatim Fortran translations should allow to move the lowering +code to a translation pass without massive changes. + +An operation will at least be created for each of the following transformational +intrinsics: all, any, count, cshift, dot_product, eoshift, findloc, iall, iany, +iparity, matmul, maxloc, maxval, minloc, minval, norm2, pack, parity, product, +reduce, repeat, reshape, spread, sum, transfer, transpose, trim, unpack. + +For the following transformational intrinsics, the current lowering to runtime +call will probably be used since there is little point to keep them high level: +- command_argument_count, get_team, null, num_images, team_number, this_image + that are more program related (and cannot appear for instance in constant + expressions) +- selected_char_kind, selected_int_kind, selected_real_kind that returns scalar + integers + +#### Introducing operations for character operations and elemental intrinsic functions + + +Motivation: represent character operations without requiring the operand and +results to be materialized in memory. + +fir.char_op is intended to represent: +- Character concatenation (//) +- Character MIN/MAX +- Character MERGE +- “SET_LENGTH” +- Character conversions +- REPEAT +- INDEX +- CHAR +- Character comparisons +- LEN_TRIM + +The arguments must be scalars, the elemental aspect should be handled by a +hlfir.elemental operation. + +Syntax: +``` +%res = hlfir.“char_op” %expr_or_var +``` + +Just like for the transformational intrinsics, if no optimization occurs, these +operations will be lowered to memory operations with temporary results (if the +result is a character), using the same generation code as the one currently used +in lowering. + +#### hlfir.array_ctor + +Motivation: represent array constructor without creating temporary + +Many array constructors have a limited number of elements (less than 10), the +current lowering of array constructor is rather complex because it must deal +with the generic cases. + +Having a representation to represent array constructor will allow an easier +lowering of array constructor, and make array ctor a lot easier to manipulate. +For instance, for small array constructors, loops could could be unrolled with +the array ctor elements without ever creating a dynamically allocated array +temporary and loop nest using it. + +Syntax: +``` +%array_ctor = hlfir.array_ctor %expr1, %expr2 …. +``` + +Note that hlfir.elemental could be used to implement some ac-implied-do, +although this is not yet clarified since ac-implied-do may contain more than +one scalar element (they may contain a list of scalar and array values, which +would render the representation in a hlfir.elemental tricky, but maybe not +impossible using if/then/else and hlfir.elemental nests using the index value). +One big issue though is that hlfir.elemental requires the result shape to be +pre-computed (it is an operand), and with an ac-implied-do containing user +transformational calls returning allocatable or pointer arrays, it is +impossible to pre-evaluate the shape without evaluating all the function calls +entirely (and therefore all the array constructor elements). + +#### hlfir.get_extent + +Motivation: inquire about the extent of a hlfir.expr, variable, or fir.shape + +Syntax: +``` +%extent = hlfir.get_extent %shape_expr_or_var, dim +``` + +dim is a constant integer attribute. + +This allows inquiring about the extents of expressions whose shape may not be +yet computable without generating detailed, low level operations (e.g, for some +transformational intrinsics), or to avoid going into low level details for +pointer and allocatable variables (where the descriptor needs to be read and +loaded). + +#### hlfir.get_typeparam + +Motivation: inquire about the type parameters of a hlfir.expr, or variable. + +Syntax: +``` +%param = hlfir.get_typeparam %expr_or_var [, param_name] +``` +- param_name is an optional string attribute that must contain the length + parameter name if %expr_or_var is a derived type. + +#### hlfir.get_dynamic_type + +Motivation: inquire about the dynamic type of a polymorphic hlfir.expr or +variable. + +Syntax: +``` +%dynamic_type = hlfir.get_dynamic_type %expr_or_var +``` + +#### hlfir.get_lbound + +Motivation: inquire about the lower bounds of variables without digging into +the implementation details of pointers and allocatables. + +Syntax: +``` +%lb = hlfir.get_lbound %var, n +``` + +Note: n is an integer constant attribute for the (zero based) dimension. + +#### hlfir.shape_meet + +Motivation: represent conformity requirement/information between two array +operands so that later optimization can choose the best shape information +source, or insert conformity runtime checks. + +Syntax: +``` +%shape = hlfir.shape_meet %shape1, %shape2 +``` + +Suppose A(n), B(m) are two explicit shape arrays. Currently, when A+B is +lowered, lowering chose which operand shape gives the result shape information, +and it is later not retrievable that both n and m can be used. If lowering +chose n, but m later gets folded thanks to inlining or constant propagation, the +optimization passes have no way to use this constant information to optimize the +result storage allocation or vectorization of A+B. hlfir.shape_meet intends to +delay this choice until constant propagation or inlining can provide better +information about n and m. + +#### hlfir.forall + +Motivation: segregate the Forall lowering complexity in its own unit. + +Forall is tough to lower because: +- Lowering it in an optimal way requires analyzing several assignments/mask + expressions. +- The shape of the temporary needed to store intermediate evaluation values is + not a Fortran array in the general case, and cannot in the general case be + maximized/pre-computed without executing the forall to compute the bounds of + inner forall, and the shape of the assignment operands that may depend on + the bound values. +- Mask expressions evaluation should be affected by previous assignment + statements, but not by the following ones. Array temporaries may be + required for the masks to cover this. +- On top of the above points, Forall can contain user assignments, pointer + assignments, and assignment to whole allocatable. + + +The hlfir.forall syntax would be exactly the one of a fir.do_loop. The +difference would be that hlfir.assign and hlfir.ptr_assign inside hlfir.forall +have specific semantics (the same as in Fortran): +- Given one hlfir.assign, all the iteration values of the LHS/RHS must be + evaluated before the assignment of any value is done. +- Given two hlfir.assign, the first hlfir.assign must be fully performed + before any evaluation of the operands of the second assignment is done. +- Masks (fir.if arguments), if any, should be evaluated before any nested + assignments. Any assignments syntactically before the where mask occurrence + must be performed before the mask evaluation. + +Note that forall forbids impure function calls, hence, no calls should modify +any other expression evaluation and can be removed if unused. + +The translation of hlfir.forall will happen by: +- 1. Determining if the where masks value may be modified by any assignments + - Yes, pre-compute all masks in a pre-run of the forall loop, creating + a “forall temps” (we may need a FIR concept to help here). + - No, Do nothing (or indicate it is safe to evaluate masks while evaluating + the rest). +- 2. Determining if a hlfir.assign operand expression depends on the + previous hlfir.assign left-hand side base value. + - Yes, split the hlfir.assign into their own nest of hlfir.forall loops. + - No, do nothing (or indicate it is safe to evaluate the assignment while + evaluating previous assignments) +- 3. For each assignments, check if the RHS/LHS operands value may depend + on the LHS base: + - Yes, split the forall loops. Insert a “forall temps” before the loops for + the “smallest” part that may overlap (which may be the whole RHS, or some + RHS sub-part, or some LHS indices). In the first nest, evaluate this + overlapping part into the temp. In the next forall loop nest, modify the + assignment to use the temporary, and add the [no_overlap] flag to indicate + no further temporary is needed. Insert code to finalize the temp after its + usage. + +### Tagging variable uses in high-level operations (fir.ref attribute) + +All operations defined above that accept "variables" (i.e: memory addresses or +box values that were produced by fir.declare, hlfir.associate, or +hlfir.designate) must have a fir.ref = mangled_name_attribute that matches the +fir.def on the operation that created them (it will be added automatically by +the operation builder). That is to ensure optimization passes do not merge +seemingly identical operations using variables with different properties, and +also to ensure that the matching defining operation can always be retrieved to +get all the variable properties (shape, bounds, type parameters and +attributes). + +Two other alternatives have been considered and rejected: +- Using MLIR symbols. This has been rejected because MLIR symbols are mainly + intended to deal with globals and functions that may refer to each other + before being defined. Their processing is not as light as normal values, and + would require to turn every FIR operation with a region into an MLIR symbol + table. This would especially be annoying given fir.designator also produce + variables with their own properties, which would imply creating a lot of MLIR + symbols. All the operations that both accept variable and expression operands + would also either need to be more complex in order to both accept SSA values + or MLIR symbol operands (or some fir.as_expr %var operation should be added to + turn a variable into an expression). Given all variable definitions will + dominates their uses, it seems more adequate to use an SSA model with named + attributes. Using SSA values also makes the transition and mix with + lower-level FIR operations smoother: a variable SSA usage can simply be + replaced by lower-level FIR operations using the same SSA value. +- Another alternative could be making all operations defining variables return + fir.box, and repeating the variable attributes (fir.target...) on all + operations using the variable. This would allow the link between the variable + definition and usage to become broken (variable could travel as block + arguments). But this would risk littering the codegen with fir.box + manipulations (creating, writing and reading to descriptors) that may lead to + poor performance. Maintaining all the attributes on the operations would also + be more cumbersome than only maintaining the variable name in the fir.ref + attribute. + +Lower-level operations (the current FIR operations), do not require this strong +link between a memory address and the variable definition, and it will not be +necessary to add fir.ref attributes to those. During alias analysis on FIR using +lower-level operations (like loads and stores), any memory reference that cannot +be resolved to a Fortran variable or some unrelated temporary allocation is +considered as potentially overlapping. + +The variable definition will be guaranteed to have a unique name after lowering, +and some care might have to be taken when later duplicating regions that define +variables in a way that could lead a variable usage to have two dominating +definitions with the same name (this could for instance happen after inlining +two calls to the same procedure inside the same region). Inlining will need to +take care of those conflicts. This could be done by randomizing the inlined +variable name attributes (like by adding a counter index that is incremented +after each call inlining). + +## New HLFIR Transformation Passes + +### Mandatory Passes (translation towards lower-level representation) + +Note that these passes could be implemented as a single MLIR pass, or successive +passes. + +- Forall rewrites (getting rid of hlfir.forall) +- Array assignment rewrites (getting rid of array hlfir.assign) +- Bufferization: expression temporary materialization (getting rid of + hlfir.expr, and all the operations that may produce it like transformational + intrinsics and hlfir.elemental, hlfir.apply). +- Call interface argument association lowering (getting rid of hlfir.associate + and hlfir.end_associate) +- Lowering high level operations using variables into FIR operations + operating on memory (translating hlfir.designate, scalar hlfir.assign, + hlfir.finalize into fir.array_coor, fir.do_loop, fir.store, fir.load. + fir.embox/fir.rebox operations). + +Note that these passes do not have to be the first one run after lowering. It is +intended that CSE, DCE, algebraic simplification, inlining and some other new +high-level optimization passes discused below be run before doing any of these +translations. + +After that, the current FIR pipeline could be used to continue lowering towards +LLVM. + +### Optimization Passes + +- Elemental expression inlining (inlining of hlfir.elemental in hlfir.apply) +- User function Inlining +- Transformational intrinsic rewrites as hlfir.elemental expressions +- Assignments propagation +- Shape/Rank/dynamic type propagation + +These high level optimization passes can be run any number of times in any +order. + +## Transition Plan + +The new higher-level steps proposed in this document will require significant +refactoring of lowering. Codegen should not be impacted since the current FIR +will remain untouched. + +A lot of the code in lowering generating Fortran features (like an intrinsic or +how to do assignments) is based on the fir::ExtendedValue concept. This +currently is a collection of mlir::Value that allows describing a Fortran object +(either a variable or an evaluated expression result). The variable and +expression concepts described above should allow to keep an interface very +similar to the fir::ExtendedValue, but having the fir::ExtendedValue wrap a +single value or mlir::Operation* from which all of the object entity +information can be inferred. + +That way, all the helpers currently generating FIR from fir::ExtendedValue could +be kept and used with the new variable and expression concepts with as little +modification as possible. + +The proposed plan is to: +- 1. Introduce the new HLFIR operations. +- 2. Refactor fir::ExtendedValue so that it can work with the new variable and + expression concepts (requires part of 1.). +- 3. Introduce the new translation passes, using the fir::ExtendedValue helpers + (requires 1.). +- 3.b Introduce the new optimization passes (requires 1.). +- 4. Introduce the fir.declare and hlfir.finalize usage in lowering (requires 1. + and 2. and part of 3.). + +The following steps might have to be done in parallel of the current lowering, +to avoid disturbing the work on performance until the new lowering is complete +and on par. + +- 5. Introduce hlfir.designate and hlfir.associate usage in lowering. +- 6. Introduce lowering to hlfir.assign (with RHS that is not a hlfir.expr), + hlfir.ptr_assign. +- 7. Introduce lowering to hlfir.expr and related operations. +- 8. Introduce lowering to hlfir.forall. + +At that point, lowering using the high-level FIR should be in place, allowing +extensive testing. +- 9. Debugging correctness. +- 10. Debugging execution performance. + +The plan is to do these steps incrementally upstream, but for lowering this will +most likely be safer to do have the new expression lowering implemented in +parallel upstream, and to add an option to use the new lowering rather than to +directly modify the current expression lowering and have it step by step +equivalent functionally and performance wise. + +## Examples + +### Example 1: simple array assignment + +```Fortran +subroutine foo(a, b) + real :: a(:), b(:) + a = b +end subroutine +``` + +Lowering output: + +```HLFIR +func.func @_QPfoo(%arg0: !fir.box>, %arg1: !fir.box>) { + %a = fir.declare %arg0 {fir.def = "_QPfooEa"} : !fir.box> + %b = fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box> + hlfir.assign %b to %a {fir.ref = "_QPfooEb,_QPfooEa"}: !fir.box> + return +} +``` + +HLFIR array assignment lowering pass: +- Query: can %b value depend on %a? No, they are two different argument + associated variables that are neither target nor pointers. +- Lower to assignment to loop: + +```HFLIR +func.func @_QPfoo(%arg0: !fir.box>, %arg1: !fir.box>) { + %a = fir.declare %arg0 {fir.def = "_QPfooEa"} : !fir.box> + %b = fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box> + + %ashape = hlfir.shape_of %a {fir.ref = "_QPfooEa"} + %bshape = hlfir.shape_of %b {fir.ref = "_QPfooEb"} + %shape = hlfir.shape_meet %ashape, %bshape + %extent = hlfir.get_extent %shape, 0 + + %c1 = arith.constant 1 : index + + fir.do_loop %i = %c1 to %extent step %c1 unordered { + %belt = hlfir.designate %b, %i {fir.ref = "_QPfooEb", "fir.def=_QPfooEb.des001"} + %aelt = hlfir.designate %a, %i {fir.ref = "_QPfooEa", "fir.def=_QPfooEa.des002"} + hlfir.assign %belt to %aelt {fir.ref = "_QPfooEb.des001,_QPfooEa.des002"}: fir.ref, fir.ref + } + return +} +``` + +HLFIR variable operations to memory translation pass: +- hlfir.designate is rewritten into fir.array_coor operation on the variable + associated memory buffer, and returns the element address +- For numerical scalar, hlfir.assign is rewritten to fir.store (and fir.load + of the operand if needed), for derived type and characters, memory copy + (and padding for characters) is done. +- hlfir.shape_of are lowered to fir.box_dims, here, no constant information + was obtained from any of the source shape, so hlfir.shape_meet is a no-op, + selecting the first shape (a conformity runtime check could be inserted + under debug options). +- fir.declare are kept (they are no-ops) so that it will be possible to + generate debug information for LLVM. + +This pass would wrap operations defining variables (fir.declare/hlfir.designate) +as fir::ExtendedValue, and use all the current helpers operating on it +(e.g.: fir::factory::genScalarAssignment). + +``` +func.func @_QPfoo(%arg0: !fir.box>, %arg1: + !fir.box>) { + %a = fir.declare %arg0 {fir.def = "_QPfooEa"} : !fir.box> + %b = fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box> + %c1 = arith.constant 1 : index + %dims = fir.box_dims %a, 1 + fir.do_loop %i = %c1 to %dims#1 step %c1 unordered { + %belt = fir.array_coor %b, %i : (!fir.box>, index) -> fir.ref + %aelt = fir.array_coor %a, %i : (!fir.box>, index) -> fir.ref + %bval = fir.load %belt : f32 + fir.store %bval to %aelt : fir.ref + } + return +} +``` + +This reaches the current FIR level (except fir.declare_op that can be kept until +LLVM codegen and dropped on the floor if there is no debug information +generated). + +### Example 2: array assignment with elemental expression + +```Fortran +subroutine foo(a, b, p, c) + real, target :: a(:) + real :: b(:), c(100) + real, pointer :: p(:) + a = b*p + c +end subroutine +``` + +Lowering output: + +```HLFIR +func.func @_QPfoo(%arg0: !fir.box>, %arg1: !fir.box>, %arg2: !fir.box>>, %arg3: !fir.ref>) { + %a = fir.declare %arg0 {fir.def = "_QPfooEa"} {fir.target} : !fir.box + %b = fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box> + %p = fir.declare %arg2 {fir.def = "_QPfooEp", fir.ptr} : !fir.box>> + %c = fir.declare %arg3 {fir.def = "_QPfooEc"} : !fir.ref> + %bshape = hlfir.shape_of %b {fir.ref = "_QPfooEb"} + %pshape = hlfir.shape_of %p {fir.ref = "_QPfooEp"} + %shape1 = hlfir.shape_meet %bshape, %pshape + %mul = hlfir.elemental(%i:index) %shape1 { + %belt = hlfir.designate %b, %i {fir.ref = "_QPfooEb", fir.def= "_QPfooEb.des001"} + %p_lb = hlfir.get_lbound %p, 1 {fir.ref = "_QPfooEp"} + %i_zero = arith.subi %i, %c1 + %i_p = arith.addi %i_zero, %p_lb + %pelt = hlfir.designate %p, %i_p {fir.ref = "_QPfooEp", fir.def= "_QPfooEp.des002"} + %bval = fir.load %belt : f32 + %pval = fir.load %pelt : f32 + %mulres = arith.mulf %bval, %pval : f32 + fir.result %mulres : f32 + } + %cshape = hlfir.shape_of %c + %shape2 = hlfir.shape_meet %cshape, %shape1 + %add = hlfir.elemental(%i:index) %shape2 { + %mulval = hlfir.apply %mul, %i : f32 + %celt = hlfir.designate %c, %i {fir.ref = "_QPfooEc", fir.def= "_QPfooEc.des003"} + %cval = fir.load %celt + %add_res = arith.addf %mulval, %cval + fir.result %add_res + } + hlfir.assign %add to %a {fir.ref = "_QPfooEa"} : hlfir.expr, !fir.box + return +} +``` + +Step 1: hlfir.elemental inlining: inline the first hlfir.elemental into the +second one at the hlfir.apply. + + +```HLFIR +func.func @_QPfoo(%arg0: !fir.box>, %arg1: !fir.box>, %arg2: !fir.box>>, %arg3: !fir.ref>) { + %a = fir.declare %arg0 {fir.def = "_QPfooEa"} {fir.target} : !fir.box + %b = fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box> + %p = fir.declare %arg2 {fir.def = "_QPfooEa", fir.ptr} : !fir.box>> + %c = fir.declare %arg3 {fir.def = "_QPfooEp"} : !fir.ref> + %bshape = hlfir.shape_of %b {fir.ref = "_QPfooEb"} + %pshape = hlfir.shape_of %p {fir.ref = "_QPfooEp"} + %shape1 = hlfir.shape_meet %bshape, %pshape + %cshape = hlfir.shape_of %c + %shape2 = hlfir.shape_meet %cshape, %shape1 + %add = hlfir.elemental(%i:index) %shape2 { + %belt = hlfir.designate %b, %i {fir.ref = "_QPfooEb", fir.def= "_QPfooEb.des001"} + %p_lb = hlfir.get_lbound %p, 1 {fir.ref = "_QPfooEp"} + %i_zero = arith.subi %i, %c1 + %i_p = arith.addi %i_zero, %p_lb + %pelt = hlfir.designate %p, %i_p {fir.ref = "_QPfooEp", fir.def= "_QPfooEp.des002"} + %bval = fir.load %belt : f32 + %pval = fir.load %pelt : f32 + %mulval = arith.mulf %bval, %pval : f32 + %celt = hlfir.designate %c, %i {fir.ref = "_QPfooEc", fir.def= "_QPfooEc.des003"} + %cval = fir.load %celt + %add_res = arith.addf %mulval, %cval + fir.result %add_res + } + hlfir.assign %add to %a {fir.ref = "_QPfooEa"} : hlfir.expr, !fir.box + return +} +``` + +Step2: alias analysis around the array assignment: + +- May %add value depend on %a variable? +- Gather variable and function calls in %add operand tree (visiting + hlfir.elemental regions) +- Gather references to %b, %p, and %c. %p is a pointer variable according to + its defining operations. It may alias with %a that is a target. -> answer + yes. +- Insert temporary, and duplicate array assignments, that can be lowered to + loops at that point + +Note that the alias analysis could have already occurred without inlining the +%add hlfir.elemental. + + +```HLFIR +func.func @_QPfoo(%arg0: !fir.box>, %arg1: !fir.box>, %arg2: !fir.box>>, %arg3: !fir.ref>) { + %a = fir.declare %arg0 {fir.def = "_QPfooEa"} {fir.target} : !fir.box + %b = fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box> + %p = fir.declare %arg2 {fir.def = "_QPfooEp", fir.ptr} : !fir.box>> + %c = fir.declare %arg3 {fir.def = "_QPfooEc"} : !fir.ref> + %bshape = hlfir.shape_of %b {fir.ref = "_QPfooEb"} + %pshape = hlfir.shape_of %p {fir.ref = "_QPfooEp"} + %shape1 = hlfir.shape_meet %bshape, %pshape + %cshape = hlfir.shape_of %c + %shape2 = hlfir.shape_meet %cshape, %shape1 + %add = hlfir.elemental(%i:index) %shape2 { + %belt = hlfir.designate %b, %i {fir.ref = "_QPfooEb", fir.def= "_QPfooEb.des001"} + %p_lb = hlfir.get_lbound %p, 1 {fir.ref = "_QPfooEp"} + %i_zero = arith.subi %i, %c1 + %i_p = arith.addi %i_zero, %p_lb + %pelt = hlfir.designate %p, %i_p {fir.ref = "_QPfooEp", fir.def= "_QPfooEp.des002"} + %bval = fir.load %belt : f32 + %pval = fir.load %pelt : f32 + %mulval = arith.mulf %bval, %pval : f32 + %celt = hlfir.designate %c, %i {fir.ref = "_QPfooEc", fir.def= "_QPfooEc.des003"} + %cval = fir.load %celt + %add_res = arith.addf %mulval, %cval + fir.result %add_res + } + %extent = hlfir.get_extent %shape2, 0: (fir.shape<1>) -> index + %tempstorage = fir.allocmem %extent : fir.heap> + %temp = fir.declare %tempstorage, shape %extent {fir.def = QPfoo.temp001} : (index) -> fir.heap> + hlfir.assign %add to %temp : no_overlap {fir.ref = "QPfoo.temp001"} : hlfir.expr, !fir.box + hlfir.assign %temp to %a : no_overlap {fir.ref = " QPfoo.temp001,_QPfooEa"} : hlfir.expr, !fir.box + hlfir.finalize %temp {fir.ref = "QPfoo.temp001"} + fir.freemem %tempstorage + return +} +``` + +Step 4: Lower assignments to regular loops since they have the no_overlap +attribute, and inline the hlfir.elemental into the first loop nest. + +```HLFIR +func.func @_QPfoo(%arg0: !fir.box>, %arg1: !fir.box>, %arg2: !fir.box>>, %arg3: !fir.ref>) { + %a = fir.declare %arg0 {fir.def = "_QPfooEa"} {fir.target} : !fir.box + %b = fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box> + %p = fir.declare %arg2 {fir.def = "_QPfooEp", fir.ptr} : !fir.box>> + %c = fir.declare %arg3 {fir.def = "_QPfooEc"} : !fir.ref> + %bshape = hlfir.shape_of %b {fir.ref = "_QPfooEb"} + %pshape = hlfir.shape_of %p {fir.ref = "_QPfooEp"} + %shape1 = hlfir.shape_meet %bshape, %pshape + %cshape = hlfir.shape_of %c + %shape2 = hlfir.shape_meet %cshape, %shape1 + %extent = hlfir.get_extent %shape2, 0: (fir.shape<1>) -> index + %tempstorage = fir.allocmem %extent : fir.heap> + %temp = fir.declare %tempstorage, shape %extent (index) fir.def = QPfoo.temp001} : fir.heap> + fir.do_loop %i = %c1 to %shape2 step %c1 unordered { + %belt = hlfir.designate %b, %i {fir.ref = "_QPfooEb", fir.def= "_QPfooEb.des001"} + %p_lb = hlfir.get_lbound %p, 1 {fir.ref = "_QPfooEp"} + %i_zero = arith.subi %i, %c1 + %i_p = arith.addi %i_zero, %p_lb + %pelt = hlfir.designate %p, %i_p {fir.ref = "_QPfooEp", fir.def= "_QPfooEp.des002"} + %bval = fir.load %belt : f32 + %pval = fir.load %pelt : f32 + %mulval = arith.mulf %bval, %pval : f32 + %celt = hlfir.designate %c, %i {fir.ref = "_QPfooEc", fir.def= "_QPfooEc.des003"} + %cval = fir.load %celt + %add_res = arith.addf %mulval, %cval + %tempelt = hlfir.designate %temp, %i {fir.ref = "_QPfoo.temp001", fir.def="_QPfoo.temp001.des004"} + hlfir.assign %add_res to %tempelt {fir.ref = "_QPfoo.temp001.des004"}: f32, fir.ref + } + fir.do_loop %i = %c1 to %shape2 step %c1 unordered { + %aelt = hlfir.designate %a, %i {fir.ref = "_QPfooEa", fir.def= "_QPfooEa.des005"} + %tempelt = hlfir.designate %temp, %i {fir.ref = "_QPfoo.temp001", fir.def="_QPfoo.temp001.des006"} + hlfir.assign %add_res to %tempelt {fir.ref = "_QPfoo.temp001.des005,_QPfooEa.des005"}: f32, fir.ref + } + hlfir.finalize %temp {fir.ref = "QPfoo.temp001"} + fir.freemem %tempstorage + return +} +``` + +Step 5 (may also occur earlier or several times): shape propagation. +- %shape2 can be inferred from %cshape that has constant shape: the + hlfir.shape_meet results can be replaced by it, and if the option is set, + conformance checks can be added for %a, %b and %p. +- %temp is small, and its fir.allocmem can be promoted to a stack allocation + +```HLFIR +func.func @_QPfoo(%arg0: !fir.box>, %arg1: !fir.box>, %arg2: !fir.box>>, %arg3: !fir.ref>) { + // ..... + %cshape = fir.shape %c100 + %extent = %c100 + // updated fir.alloca + %tempstorage = fir.alloca %extent : fir.ref> + %temp = fir.declare %tempstorage {fir.def = "_QPfoo.temp001"} : fir.ref> + fir.do_loop %i = %c1 to %c100 step %c1 unordered { + // ... + } + fir.do_loop %i = %c1 to %c100 step %c1 unordered { + // ... + } + hlfir.finalize %temp {fir.ref = "QPfoo.temp001"} + // deleted fir.freemem %tempstorage + return +} +``` + +Step 6: lower hlfir.designate/hlfir.assign in a translation pass: + +At this point, the representation is similar to the current representation after +the array value copy pass, and the existing FIR flow is used (lowering +fir.do_loop to cfg and doing codegen to LLVM). + +### Example 3: assignments with vector subscript + +```Fortran +subroutine foo(a, b, v) + real :: a(*), b(*) + integer :: v(:) + a(v) = b(v) +end subroutine +``` + +Lowering of vector subscripted entities would happen as follow: +- vector subscripted entities would be lowered as a hlfir.elemental implementing + the vector subscript addressing. +- If the vector appears in a context where it can be modified (which can only + be an assignment LHS, or in input IO), lowering could transform the + hlfir.elemental into hlfir.forall (for assignments), or a fir.iter_while (for + input IO) by inlining the elemental body into the created loops, and + identifying the hlfir.designate producing the result. + +```HFLFIR +func.func @_QPfoo(%arg0: !fir.ref>, %arg1: !fir.ref>, %arg2: !fir.box<>) { + %a = fir.declare %arg0 {fir.def = "_QPfooEa"} : !fir.ref> + %b = fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.ref> + %v = fir.declare %arg2 {fir.def = "_QPfooEv"} : !fir.box> + %vshape = hlfir.shape_of %v : fir.shape<1> + %bsection = hlfir.elemental(%i:index) %vshape : (fir.shape<1>) -> hlfir.expr { + %v_elt = hlfir.designate %v, %i {fir.ref = "_QPfooEv", fir.def="_QPfooEv.des001"} : (!fir.box>, index) -> fir.ref + %v_val = fir.load %v_elt : fir.ref + %cast = fir.convert %v_val : (i32) -> index + %b_elt = hlfir.designate %b, %v_val {fir.ref = "_QPfooEb", fir.def="_QPfooEb.des002"} : (!fir.ref>, index) -> fir.ref + %b_val = fir.load %b_elt : fir.ref + fir.result %b_elt + } + %extent = hlfir.get_extent %vshape, 0 : (fir.shape<1>) -> index + %c1 = arith.constant 1 : index + hlfir.forall (%i from %c1 to %extent step %c1) { + %b_section_val = hlfir.apply %bsection, %i : (hlfir.expr, index) -> f32 + %v_elt = hlfir.designate %v, %i {fir.ref = "_QPfooEv", fir.def="_QPfooEv.des003"} : (!fir.box>, index) -> fir.ref + %v_val = fir.load %v_elt : fir.ref + %cast = fir.convert %v_val : (i32) -> index + %a_elt = hlfir.designate %a, %v_val {fir.ref = "_QPfooEa", fir.def="_QPfooEa.des004"} : (!fir.ref>, index) -> fir.ref + hlfir.assign %b_section_val to %a_elt {fir.ref="_QPfooEa.des004"} : f32, fir.ref + } + return +} +``` + +This would then be lowered as described in the examples above (hlfir.elemental +will be inlined, hlfir.forall will be rewritten into normal loops taking into +account the alias analysis, and hlfir.assign/hlfir.designate operations will be +lowered to fir.array_coor and fir.store operations). + +# Alternatives that were not retained + +## Using a non-MLIR based mutable CFG representation + +An option would have been to extend the PFT to describe expressions in a way +that can be annotated and modified with the ability to introduce temporaries. +This has been rejected because this would imply a whole new set of +infrastructure and data structures while FIR is already using MLIR +infrastructure, so enriching FIR seems a smoother approach and will benefit from +the MLIR infrastructure experience that was gained. + +## Using some existing MLIR dialects for the high-level Fortran. + +### Why not using Linalg dialect? + +The linalg dialects offers a powerful way to represent array operations: the +linalg.generic operation takes a set of input and output arrays, a related set +of affine maps to represent how these inputs/outputs are to be addressed, and a +region detailing what operation should happen at each iteration point, given the +input and output array elements. It seems mainly intended to optimize matmul, +dot, and sum. + +Issues: + +- The linalg dialect is tightly linked to the tensor/memref concepts that + cannot represent byte stride based discontinuity and would most likely + require FIR to use MLIR memref descriptor format to take advantage of it. +- It is not clear whether all Fortran array expression addressing can be + represented as semi affine maps. For instance, vector subscripted entities + can probably not, which may force creating temporaries for the related + designator expressions to fit in this framework. Fortran has a lot more + transformational intrinsics than matmul, dot, and sum that can and should + still be optimized. + +So while there may be benefits to use linalg at the optimization level (like +rewriting fir.sum/fir.matmul to a linalg sum, with dialect types plumbing +around the operand and results, to get tiling done by linalg), using it as a +lowering target would not cover all Fortran needs (especially for the non +semi-affine cases). +So using linalg is for now left as an optimization pass opportunity in some +cases that could be experimented. + +### Why not using Shape dialect? + +MLIR shape dialect gives a set of operations to manipulate shapes. The +shape.meet operation is exactly similar with hlfir.shape_meet, except that it +returns a tensor or a shape.shape. + +The main issue with using the shape dialect is that it is dependent on tensors. +Bringing the tensor toolchain in flang for the sole purpose of manipulating +shape is not seen as beneficial given that the only thing Fortran needs is +shape.meet The shape dialect is a lot more complex because it is intended to +deal with computations involving dynamically ranked entity, which is not the +case in Fortran (assumed rank usage in Fortran is greatly limited). + +## Using embox/rebox and box as an alternative to fir.declare/hlfir.designate and hlfir.expr/ variable concept + +All Fortran entities (*) can be described at runtime by a fir.box, except for +some attributes that are not part of the runtime descriptors (like TARGET, +OPTIONAL or VOLATILE). In that sense, it would be possible to have +fir.declare, hlfir.designate, and hlfir.associate be replaced by embox/rebox, +and also to have all operation creating hlfir.expr to create fir.box. + +This was rejected because this would lack clarity, and make embox/rebox +semantics way too complex (their codegen is already non-trivial), and also +because it would then not really be possible to know if a fir.box is an +expression or a variable when it is an operand, which would make reasoning +harder: this would already imply that expressions have been buffered, and it is +not clear when looking at a fir.box if the value it describe may change or not, +while a hlfir.expr value cannot change, which allows moving its usages more +easily. + +This would also risk generating too many runtime descriptors read and writes +that could make later optimizations harder. + +Hence, while this would be functionally possible, this makes the reasoning about +the IR harder and would not benefit high-level optimizations. + +(*) This not true for vector subscripted variables, but the proposed plan will +also not allow creating vector subscripted variables as the result of a +hlfir.designate. Lowering will deal with the assignment and input IO special +case using hlfir.elemental.