This is an archive of the discontinued LLVM Phabricator instance.

[flang][RFC] Adding higher level FIR ops to ease expression lowering
ClosedPublic

Authored by jeanPerier on Sep 20 2022, 8:38 AM.

Details

Summary

This document describes a new FIR value type (fir.expr) and some new
higher level FIR operations to make lowering to FIR more straightforward
and make pattern matching of high level Fortran concepts (array and
character assignments, character operations, and transformational
intrinsics) easier in FIR.

This should allow implementing the remaining gaps in Fortran 95 features
without increasing the lowering code complexity too much, and get a
clean start for the remaining F2003 and F2018 features.

Diff Detail

Event Timeline

jeanPerier created this revision.Sep 20 2022, 8:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 20 2022, 8:38 AM
jeanPerier requested review of this revision.Sep 20 2022, 8:38 AM
tschuett added inline comments.Sep 20 2022, 11:05 AM
flang/docs/HighLevelFIR.md
961

Isn't this point about the expressiveness of MLIR? What are the engineering costs of adding a FIRX dialect for the higher ops?

What would be the benefit of having a separate dialect?

jeanPerier added inline comments.Sep 21 2022, 1:13 AM
flang/docs/HighLevelFIR.md
961

The benefit of having a separate dialect is to strongly split the high level ops that require information about Fortran variables to be retrievable in the IR (via the fir.def/fir.ref) and the current operations that are lower level and do not require such information.
After the translation pass of variable related operation, this dialect would be illegal.

There is a precedent in FIR with the fir::cg dialect ops (https://github.com/llvm/llvm-project/blob/main/flang/include/flang/Optimizer/CodeGen/CGOps.td) that helps simplifying the addressing and emboxing operation before codegen.

I think the engineering cost are mostly about having different td files, headers and .cpp for these new ops and types, but also to register the dialect in the passes that will work with it. That last point may be a bit more annoying (it does not matter with the fir::cg dialect because codegen is the only pass meant to be run with this dialect).

I do not have a strong opinion here.

Very good design. This seems to be one promising method to handle some bugs in forall, pattern match for inline intrinsics and OpenMP reduction (not totally sure if it will work), and expected to have more information of variables and expression for performance optimization in FIR.

flang/docs/HighLevelFIR.md
75

Nit

87

Will the debug information be mlir::Location?

89

nit

89

Will this include the aliasing analysis in Fortran 2018 15.5.2.13? Is this Generic Fortran aliasing analysis documented somewhere with examples currently?

94

Nit

137

Can COMPARE be added in the set of character operations? We found inlining character comparison has great performance improvement in some workloads (the possible reason is the character is in select case statement). Adding it will make inline work easier?

170

Currently, OpenMP reduction clause uses the fragile pattern match in lowering. When refactoring the lowering code, is it possible to make some expressions lowering shared with OpenMP using fir.expr? Specifically, the following operators, intrinsics, and user-defined operators will be redefined in OpenMP reduction clause:

+, *, .and., .or., .eqv., .neqv., max, min, iand, ior, ieor
196

Why keep two complex types instead of only using fir.complex or mlir::complex?

212

editor problem?

227

This should be added in the description when this opertion is added in FIROps.td.

I am thinking if fix.expr<T> is a good solution for the argument with VALUE attribute in lowering (both caller and callee side), especially for the derived type and BIND(C)?

230
267

Will fir.declare be used for common blocks as well?

310
371

to help -> help?

437

editor problem?

461
530

editor problem?

531

editor problem?

542

nit

545

It seems there is some editor problem? from line 542 to line 547.

699

?

751
781

Another tough: The mask expression lhs variable might be changed in the assignment. There seems to be one bug in current FIR lowering. fir.forall may make it easy to support that case.

793

?

809
970

Same for line 970, 978-981, 992-1007, 1032-1041, 1053-1056, ...

peixin added inline comments.Sep 21 2022, 2:19 AM
flang/docs/HighLevelFIR.md
882
1140
1331
mehdi_amini added inline comments.Sep 21 2022, 6:09 AM
flang/docs/HighLevelFIR.md
961

to register the dialect in the passes that will work with it.

To be clear: registration is only ever needed in a pass that produces entities from a dialect (ops, attributes, types) when this dialect isn't in the input already. So a pass lowering from a high-level dialect to a lower-level dialect only defines the lower-level dialects to be registered, no need to do anything for the high-level dialect.
Similarly passes that transform within the same dialect don't need to declare anything at all.

jeanPerier marked 23 inline comments as done.
  • Remove non breaking spaces
  • Fix double spaces issues
  • Fix some typos
  • Add note about Forall masks

Thanks for the review and feedback @peixin.

flang/docs/HighLevelFIR.md
87

No, I believe source line location is already making it into the executable. What we are currently missing is the ability to interact with Fortran source variables from a debugger (e.g print, set, or watch a variable). I am not sure exactly how and when the related DWARF will have to be generated. This is a big task I should add on our TODO list. The point here is that fir.declare should allow generating this information late, because it documents everything that needs to be known about a source variable (address or descriptor, bounds, type parameters).

89

Yes, that is the idea. And no, this is not documented yet. This would deserve its own document. The idea here is that when seeing an address in FIR, we will try to find a matching fir.declare or fir.associate node. If it is the parent operation, we will directly have all the Fortran information that allows applying 15.5.2.13. If it is a block argument, the analysis will try to follow the block argument predecessors if it can, and create a list of possible Fortran variables the address belongs to. If such list cannot be built, the analysis will have conservatively to assume the address may overlap with anything.

137

Yes, I had it in mind in the "..." parts. I made it explicit.

170

Interesting, what are you currently pattern matching ?
I am not sure the current design would help you a lot in the sense that it intends to not avoid any specific elemental operations, but rather to use a generic fir.elemental concepts.

As much as possible, I would tend to extend the mlir arith dialect to support the scalar operations applicable on MLIR integer and floating point types and pattern match that. But then, for some elemental operators and intrinsics for which easy pattern matching is desirable, and where adding an arith op is not the best way, it could makes sense to add custom ops. But this may be orthogonal to the proposed change and something that could be though/designed/done in parallel, operators by operators.

Regarding user defined operators, I would need to understand better what you mean by "redefined", semantics is resolving user defined operation to function references already, so we do not really see them in lowering I believe.

196

Why keep two complex types instead of only using fir.complex or mlir::complex?

The need arise from the fact that we want to make a difference between C/Fortran complex type, and C++ std::complex type. These types are layout compatible, but not ABI compatible (they are not passed the same way on all architectures). fir::Complex implements C/Fortran complex, and mlir::Complex is translated to an std::complex like struct.

In the runtime API, we sometime interface with std::complex<>.

Using distinct types to enforce the ABI may not be the only solution. But it was the simplest and more robust so far (especially given we already had both types). If a solution is found regarding this ABI problem, moving to using mlir::complex would sound OK to me. I see it as something orthogonal to these high level changes (it could be done in parallel). The solution is probably to translate std::complex to a struct type and consider mlir::Complex to be the C complex from an ABI point of view (but I think this may conflicts with MLIR current codegen towards LLVM that would have to be overridden). fir.convert between mlir complex and the related struct type would have to be allowed and probably implemented by going through memory and doing the cast there to rely on the layout compatibility property.

212

Yes, thanks. I replaced all the non breaking spaces by normal spaces and remove the redundant ones like here.

227

I am thinking if fix.expr<T> is a good solution for the argument with VALUE attribute in lowering (both caller and callee side), especially for the derived type and BIND(C)?

It may help, but since this type is not intended to survive until LLVM codegen, the VALUE aspect would still have to be translated with the current FIR types as it is currently.

267

It will be used for the common block members once the address computation is done (maybe with the common block name as a fir.common attribute so that it can easily be identified which common block a variable belongs to). Same idea with equivalences. I am not sure if there is a need to declare the whole common block in a declare op, but if it turns out useful for OpenMP or other usages, why not.

781

Thanks, indeed. I added note about masks

793

Yes, I agree the mask evaluation cannot affect anything (just like the forall indices evaluation), but the mask evaluation may be affected by the assignments.

970

Thanks, all the none breaking space issues should have been fixed.

jeanPerier added inline comments.Sep 21 2022, 7:41 AM
flang/docs/HighLevelFIR.md
961

Thanks for clarifying this point, then I do not see big engineering costs to splitting high level ops depending on the Fortran variable and expression concepts and low level operations operating on memory and simple (integer/floating point/complex) SSA values.

Maybe the new fir.declare op should still belong to FIR directly though. I think it would be valuable to keep it until as late as possible (it is not doing anything other than bookkeeping some Fortran level information about memory storages, which can be useful until the end).

tarunprabhu added inline comments.Sep 21 2022, 8:35 AM
flang/docs/HighLevelFIR.md
961

I, too, think it makes sense to keep this in FIR. Any Fortran-specific tooling that operates on MLIR may find this useful, so having it in FIR would be helpful.

flang/docs/HighLevelFIR.md
22

And the ArrayValueCopy pass will not be needed?

84

From the OpenMP side, in the long-term we would like to move privatisation into the OpenMP dialect. So it will be very helpful if variable declarations are available in FIR. On the other hand, having more kinds of variables in Flang (FIR, HighFIR) would mean that we will have to get OpenMP Dialect to work with these different kinds of variables.

Can you consider adding the finalizer also to the declare operation?

170

Thanks @peixin for raising this point.

Currently we do a pattern match fir loads -> (optional convert) -> reduction operation in FIR/arith -> (optional convert) -> store and replace it with an omp.reduction operation.

So for a reduction happening on an addition (say b = b + a).
We pattern match on the following IR,

%b = fir.load %bref
%a = fir.load %aref
%r = arith.add %a, %b
fir.store %r

and get the following result IR.

%a = fir.load %aref
omp.reduction %a, %bref

An alternative approach that we are exploring is to see whether this can be done directly during lowering, i.e. to do custom lowering for the Assignment operation if it happens to perform a reduction.

Thanks for addressing the comments and explanations. LGTM.

flang/docs/HighLevelFIR.md
170

Thanks @kiranchandramohan for answer this question.

For "redefined", what I mean is that the addition statement b = b + a is not lowered as %b_new_val = arith.add %b_val, %a_val any more. Instead, it will be lowered as omp.reduction %a_val, %b_red.

The current expression lowering for this statement is encapsulated in ScalarExprLowering, and it is hard to be changed if with OpenMP. If with these high-level FIR Ops, and given the FIR lowering is going to be refactored, is it possible to expose these statements analysis and lowering to be open to be changed.

Beyond this, there are two scenarios having the similar problem:

  1. Complex multiplication and complex conjugate multiplication

The current lowering and codegen does not generate the "best" LLVM IR. See https://reviews.llvm.org/D134364#3807605 for some performance considerations. With these high-level FIR Ops, can the FIR lowering be factored so that some complex operation lowered directly as one IR instruction.

  1. target/chip dependent transformation

One example is as follows:
subroutine sub(a, b, c, m, n, k)

real :: a(m, k), b(k, n), c(m, n)
!DIR ARM-SME (scalar matrix extension)
c = a * b

end

With the compiler directive, the statement c = a * b can be lowered using some special transformation.

All in all, given the current high-level FIR Ops, and FIR lowering will be refactored, can we make the Fortran statement analysis and lowering more generic and extensible for possible optimization with OpenMP and compiler directives, or else. Some Fortran statement lowering may can be moved to codegen so that custom optimization can be performed.

jeanPerier marked an inline comment as done.Sep 22 2022, 2:25 AM
jeanPerier added inline comments.
flang/docs/HighLevelFIR.md
22

Right, it will be refactored into the (array) fir.assign and fir.forall translation passes.

84

On the other hand, having more kinds of variables in Flang (FIR, HighFIR) would mean that we will have to get OpenMP Dialect to work with these different kinds of variables.

With the proposed strategy, the variables in FIR and high level FIR would be the same from an SSA type point of view (memory references and boxes like currently). The difference would be that all the shape/type parameters information related to that address would be guaranteed to be retrievable in one case, and not the other.

Given you will probably need this information to privatize the variables, you could either decide to make the privatization use the high level concept, but you could also directly use a "lower" level operation where all the side information (bounds, type params...) required for privatization are operands of the privatization operation.

Basically, you can freely use a "variable" SSA value produced in a fir.declare in "lower" level operations that do not need to access the info in the fir.declare. It is perfectly fine to have the current FIR operations mixed with the new higher level operations.

Can you consider adding the finalizer also to the declare operation?

Why not, you would need it to know how to finalize OpenMP private copies after privatizations ?

Note that the runtime knows how to retrieve finalizer information from the descriptors, so calling Finalize runtime routine on privatized variables should also be sufficient (in which case you only need to know if a types needs to be finalized, not exactly how).

170

for some performance considerations. With these high-level FIR Ops, can the FIR lowering be factored so that some complex operation lowered directly as one IR instruction.

I think moving to using the mlir::complex type and operation is independent from these high level operation changes. The change proposed here do not rely on complexes being lowered to fir.complex and mlir::complex nor on how they are manipulated. I am not opposed to moving to this as long as the ABIs are solved.

For "redefined", what I mean is that the addition statement b = b + a is not lowered as %b_new_val = arith.add %b_val, %a_val any more. Instead, it will be lowered as omp.reduction %a_val, %b_red.

Are you doing this only when b is a scalar or also when b is an array ?

Why are you dissatisfied with the pattern matching at the MLIR level ? Wouldn't pattern matching the RHS and LHS evaluate::Expr in lowering be equally complex (although maybe a bit more stable) ?

With the compiler directive, the statement c = a * b can be lowered using some special transformation.

That case is interesting. I am not sure if we could use the high level ops to lower such directive or if it would have to be applied directly in lowering.

I guess the fir.assign op could be given a special attribute, and that the application of the directive could be delayed to fir.assign translation.

flang/docs/HighLevelFIR.md
170
Are you doing this only when b is a scalar or also when b is an array ?

Why are you dissatisfied with the pattern matching at the MLIR level ? Wouldn't pattern matching the RHS and LHS evaluate::Expr in lowering be equally complex (although maybe a bit more stable) ?

Currently, we have implemented it only for scalars, but the pattern matching will become complicated for pointers, allocatables and arrays. Rather than pattern matching on some FIR, once we detect that the Assignment is actually a reduction we can try to write lowering code that generates an appropriate omp.reduction operation. Alternatively, if the pattern matching is easier at the new High Level FIR we can do it there.

peixin added inline comments.Sep 22 2022, 4:23 AM
flang/docs/HighLevelFIR.md
170

I think moving to using the mlir::complex type and operation is independent from these high level operation changes. The change proposed here do not rely on complexes being lowered to fir.complex and mlir::complex nor on how they are manipulated. I am not opposed to moving to this as long as the ABIs are solved.

For the following case:

subroutine sub
  complex :: a, b, c
  c = conj(a) * b
end

With these high level FIR Ops, I hope to generate the following FIR:

%a = fir.declare ...
%b = fir.declare ...
%c = complex.conjmul %a, %b : fir.complex<4>

Anyway, I will keep watching if we can do this when refactoring FIR lowering using these high level FIR ops in the future. We had one initial try in classic-flang. If possible, we hope to contribute the middle-end and back-end code together with F18 frontend support.

Are you doing this only when b is a scalar or also when b is an array ?

It can be either scalar, array, or array section. If it is array or array section, it will be treated as if a reduction clause would be applied to each separate element of the array section.

Why are you dissatisfied with the pattern matching at the MLIR level ? Wouldn't pattern matching the RHS and LHS evaluate::Expr in lowering be equally complex (although maybe a bit more stable) ?

It generates FIR, and match the pattern, then replace the generated FIR with omp.reduction. The current pattern patching is fine, at least no bug found. However, if we can lower it from the evaluate::Expr, it may be better.

Here is one example (from the OpenMP Spec attached examples):

SUBROUTINE REDUCTION1(A, B, C, D, X, Y, N)
 REAL :: X(*), A, D
 INTEGER :: Y(*), N, B, C
 INTEGER :: I
 A = 0
 B = 0
 C = Y(1)
 D = X(1)
 !$OMP PARALLEL DO PRIVATE(I) SHARED(X, Y, N) REDUCTION(+:A) &
 !$OMP& REDUCTION(IEOR:B) REDUCTION(MIN:C) REDUCTION(MAX:D)
 DO I=1,N
 A = A + X(I)
 B = IEOR(B, Y(I))
 C = MIN(C, Y(I))
 IF (D < X(I)) D = X(I)
 END DO

END SUBROUTINE REDUCTION1

I think that the pattern match is hard to capture the max reduction of D. From the evaluate::Expr, we only need to check if the GetSymbol(lhs) is the reduction list item. We can emit the assertion if the rhs is not reduction identifier such as +, IEOR, MIN here. For the if statement, it is hard to give an reasonable assertion.

That case is interesting. I am not sure if we could use the high level ops to lower such directive or if it would have to be applied directly in lowering. I guess the fir.assign op could be given a special attribute, and that the application of the directive could be delayed to fir.assign translation.

Thanks for the confirmation.

peixin added inline comments.Sep 22 2022, 4:26 AM
flang/docs/HighLevelFIR.md
170

Agree.

jeanPerier marked an inline comment as done.Sep 23 2022, 2:09 AM
jeanPerier added inline comments.
flang/docs/HighLevelFIR.md
170

For scalar reduction, the FIR pattern will not change with the proposal, for arrays and array section, it will change (you would have a fir.elemental containing the scalar reduction operation plus a fir.assign a bit as in the second example in the example section below). With arrays, you would also need to be careful with the potential overlaps in the assignments.

I think we should aim at being able to detect reduction in FIR. OpenMP might not be the only case that would want to do special optimization with a reduction. Now, if replacing the generated FIR by a omp.reduction is required for a correctness point of view, that may be problematic, because I am not sure we can guarantee that MLIR pattern matching will work at a 100% if some in-between passes slightly modify the FIR in correct by unexpected ways.

170

With these high level FIR Ops, I hope to generate the following FIR:

I do not think the high level ops will allow to produce exactly what you wrote unless we were to add FIR specific complex operation working on "variables", and that was not my goal since MLIR complex dialect could already. However, I think that what it could be lowered to would still suits your optimization goal.

This could be be lowered to:

%a = fir.declare ... : fir.ref<mlir::complex<f32>>
%b = fir.declare ... : fir.ref<mlir::complex<f32>>
%c = fir.declare ... : fir.ref<mlir::complex<f32>>
%aval = fir.load %a
%aconj = complex.conj %aval
%bval = fir.load %b
%res = complex.mul %aconj, %b 
fir.assign %res to %c

The MLIR complex dialect canonicalization/optimization passes would then have to fold the MUL with a conj operand into a single conjmul op (does it exists yet in the MLIR complex dialect, I could not find it ?).

Here is one example (from the OpenMP Spec attached examples):

Thanks for the example. One related question, is semantics checking that what is written in the REDUCTION clause actually happens in the loop (e.g., is it checking that B appears in an assignments with IEOR) ?

If so, that means that this kind of pattern matching is already available on the parse tree / evaluate expr.
I am not a fan of low level intrinsic operations (like +) being lowered differently based on the context, so I would avoid it if possible. But if this turns out to be really required, we could try to find clean ways to overrides certain expression expr node without redefining a completely different parse-tree/expression visitor.

peixin added inline comments.Sep 23 2022, 2:32 AM
flang/docs/HighLevelFIR.md
170

The MLIR complex dialect canonicalization/optimization passes would then have to fold the MUL with a conj operand into a single conjmul op

Good point. Thanks.

(does it exists yet in the MLIR complex dialect, I could not find it ?).

No for now. If there is the opportunity for performance improvement, I think there is no reason to oppose to add such a op.

One related question, is semantics checking that what is written in the REDUCTION clause actually happens in the loop (e.g., is it checking that B appears in an assignments with IEOR) ?

No. I don't find any related semantic restrictions in the OpenMP Spec. I think rsers should guarantee it. @kiranchandramohan Right?

Here is another example in the standard attached example:

!$omp parallel do num_threads(M) reduction(task,+:x)
do i = 1,N
  x=x+1

  if( mod(i,2) == 0) then
    !$omp task in_reduction(+:x)
      x=x-1
    !$omp end task
  endif
end do

For this case, it is hard to know the pattern from the parse-tree/evaluate expr before lowering.

flang/docs/HighLevelFIR.md
170
One related question, is semantics checking that what is written in the REDUCTION clause actually happens in the loop (e.g., is it checking that B appears in an assignments with IEOR) ?

No. I don't find any related semantic restrictions in the OpenMP Spec. I think rsers should guarantee it. @kiranchandramohan Right?

Yes, there is no such restriction in the standard.

Since the parse-tree models the source more or less directly, the representation for reduction is a clause and not a separate parse-tree node that encapsulates an assignment statement or an expression. We add reduction symbols to the variables involved in the reduction operation but I don't know whether these are carried over to the expression tree.

The whole proposal seems very reasonable to me and definitely an improvement to the current status quo. Thanks a lot for putting this together @jeanPerier.

flang/docs/HighLevelFIR.md
577

I'm curious about the choice of the name, why not something closer to Fortran like array_element? apply seems overly generic to me, but maybe there is precedent elsewhere?

jeanPerier added inline comments.Oct 6 2022, 6:28 AM
flang/docs/HighLevelFIR.md
577

The rational of that name was to insist on the fact that this should not to be seen as a memory addressing operation at that level, but rather like an operation that, in certain circonstances (*), can see the fir.expr defining operation as a lambda and "apply" it given a set of indices (inlining the fir.elemental body at the fir.apply).

Affine has an operation named "affine.apply" that applies an affine map on a set of indices.

Now I am open to other names. Has anyone else opinions regarding array_element vs apply naming ?

(*) Basically that there should not be any operations between the fir.expr evaluation and use that may affect (or be affected by) the expression evaluation.

klausler added inline comments.Oct 6 2022, 8:13 AM
flang/docs/HighLevelFIR.md
577

I find the name to be entirely appropriate in the function composition view of array programming.

Thanks @jeanPerier for this RFC. Quite detailed, informative and looks good. I had a quick read and have a few questions or comments inline.

I also have the following general questions,
-> Do you see any issues in FIR/MLIR optimisations due to the presence of fir.declare?
-> Would there be scope for the community to get involved in this work to speed it up?
-> Would it be better to delay any work on emitting debug info till HighLevelFIR work is complete?

flang/docs/HighLevelFIR.md
96

Till when will these fir.declare's persist? Will this have an effect on optimisations particularly the mem2reg kind of transformations.

316

Can this be modeled by a region instead?

We discussed this in the Flang technical call. The reasons mentioned include difficulty handling exits, branches out of this etc.

442

Will there be fir.allocate or fir.deallocate operations generated for these?

485

Will fir.allocate lower to fir.allocmem or to runtime calls? Why?

537

Nit: Could you add some explanation for one based?

583
624

I am assuming the simplification of intrinsics pass (with minor modifications) will still be useful for generating an inlined version. Or would lowering and high-level FIR make this redundant?

688

Would this have a lowering to fir.allocmem?

935
1043
1056–1061
1077

Is fir.get_lbound a new operation?

jeanPerier marked 3 inline comments as done.
  • Clarify fir.alloca/fir.deallocate lowering
  • Clarify why fir.apply is one based
  • Add fir.get_lbound description
  • Fix a few typos

Thanks @jeanPerier for this RFC. Quite detailed, informative and looks good. I had a quick read and have a few questions or comments inline.

I also have the following general questions,

Thanks for the review @kiranchandramohan !

-> Do you see any issues in FIR/MLIR optimisations due to the presence of fir.declare?

I think it will help building alias analysis for FIR and that will help optimizations in general. It can be labeled as having no side effects, so in it should not disturb passes that do not care much about address origins. But you are raising a good point that mem2reg will have to do something about it.

-> Would there be scope for the community to get involved in this work to speed it up?

Yes, I think help will be welcomed. I am trying to set-up some skeleton for this new flow. And then, I think some help will be welcomed, especially around the new FIR character ops and intrinsics.

-> Would it be better to delay any work on emitting debug info till HighLevelFIR work is complete?

No, if anyone wants to jump on debug info, please do, bearing in mind that fir.declare should be a good fit to extract the required info. I will try to add this op early if anyone wants to work on this. I anyway think the first piece of debug info will be to work a small document to summarize what LLVM needs, and give a plan.

flang/docs/HighLevelFIR.md
96

My plan was for fir.declare to persist until LLVM codgen since it is a no-op that only carries information. But you are raising a good point about mem2reg. Do you know if there is a generic MLIR mem2reg pass ? In any case, I expect mem2Reg pass might need to be aware of the fir.declare "transparent" aspect and be able to delete it for variables promoted to SSA value (and move its attributes somewhere if debug info is still needed for the variable ? I am not familiar enough with debug info to be assertive here).

Regarding optimization in general, I think keeping it is in our interest since it will allow to document Fortran aspects about attributes (like target vs not target) even after inlining, which should allow better alias analysis.

316

One of the main issue I think is that it is intended to deal with argument association on the caller sides. So if we used regions, you would have something like:

fir.associate %actual1 to %dummy1 {

fir.associate %actual2 to %dummy2 {
  ....
  %res = fir.call %foo(%dumm1, %dummy2)
  ...
}

}

The first issue here is that %res is not accessible after the call where it needs to be used. It could be propagated back, but in general, this means that any SSA values created during the association lifetime is unusable, even if it is not linked to toe variable lifetime. And I find this very restrictive hard to work with.

Then, nested regions are harder to reorder than operations in a same block. Fortran tells arguments can be evaluated in any order. But generating such associate nest in lowering would in my opinion make it harder to re-order things here to to better CSE for instance.

442

Not until alias analysis is done. The rational is that the reallocation can be optimized when there is a potential overlap (the temp can be moved to the new allocatable value).

So lowering will only add the realloc attribute, and a FIR to FIR pass will lower this to a set of fir.assign (without the realloc attribute) + fir.allocate + fir.deallocate + necessary fir.if nests taking advantage of alias analysis.

485

Will fir.allocate lower to fir.allocmem or to runtime calls

Currently, the runtime is only used when allocations are none trivial (include initializations, polymorphic types, or error handling). However, we may want to offer the user the ability to debug their allocations/deallocation by validating that ALLOCATABLE being deallocated by have indeed been allocated, and are not pointing to bad addresses. In those case, the runtime would always be used.

In general, users may like the ability too easily hook into Fortran allocatable/pointer allocation via the runtime without recompiling programs. Some may prefer to have all allocations inlined when possible. So I think having the option in fir.allocate/fir.deallocate between fully inlined allocation with fir.allocmem or using the runtime makes sense.

I edited the sentence to make this clearer.

537

Nit: Could you add some explanation for one based?

Sure, I added a note. The purpose if to match Fortran default for array variables so that there is no need to generate bound adjustments when working with one based array variables in an expression.

624

I am assuming the simplification of intrinsics pass (with minor modifications) will still be useful for generating an inlined version.

Absolutely, the plan is to use this pass (modifying it so that it can tape into the fir.intrinsic ops ).

688

It depends of the array constructor.
I think we should not generate a fir.allocmem for small and easy things like [%i, %j, %k] (only scalar, no ac-implied, less than N elements, where N is some option). This should lower to fir.alloca.
Otherwise, yes, fir.allocmem will be used, although reallocation will still be needed to deal with the edge cases where the final size cannot be pre-computed (for instance [foo(), bar(), buzz()], where all three functions returns rank-one allocatable arrays).

1077

Thanks for catching this inconsistency. When writing FIR manually, I found it useful to avoid dealing with the details around pointers and allocatable (fir.load %box + fir.bo_dims). But it is not a game changer, and I am still debating whether this is useful or not.

I added a description for it above assuming it will be added for now.

rogfer01 added inline comments.Oct 10 2022, 7:42 AM
flang/docs/HighLevelFIR.md
577

Hi @jeanPerier, thanks for the context.

No objections on the name.

I don't agree that this new dialect should extend FIR. They should be separate dialects.

Update the document to reflect that the operations will be added
in a new dialect HLFIR.

I don't agree that this new dialect should extend FIR. They should be separate dialects.

Thanks for the feedback. I think that was also the consensus in https://reviews.llvm.org/D134285#inline-1294898 discussion. The exception will be fir.declare (where I see advantages to keep it until LLVM codegen to handle things like debug info and allow using Fortran level information (even after inlining) for alias analysis), I agree that it makes sense to put it in a separate dialect. I updated the document to reflect this.

Update the document to reflect that the operations will be added in a new dialect HLFIR.

Thanks @jeanPerier for the quick reply. A couple of comments or questions inline.

So I guess hlfir.expr<T> type is the only new type and is expected to work only with hlfir. Otherwise hlfir is expected to work with existing fir types?

flang/docs/HighLevelFIR.md
73

What will the frontend symbol be bound to after this change? Will it be the value generated by fir.declare or the fir.alloca ? I guess we might need some changes for the createHostAssociateVarClone function.

96

I don't think there is currently a mem2reg pass in MLIR. But there are a few downstream projects which have this and there were a couple of discussions talking about upstreaming this. But I don't think this has happened yet.
https://discourse.llvm.org/t/rfc-store-to-load-forwarding/59672
https://discourse.llvm.org/t/upstreaming-from-our-mlir-python-compiler-project/64931/4

I am assuming there will be changes required for our MemRefDataFlowOpt pass to account for the fir.declare operation.

I am also reminded of llvm's dbg intrinsics (https://llvm.org/docs/SourceLevelDebugging.html#id13). These intrinsics track variable information and includes a llvm.dbg.declare intrinsic which is deprecated. Not sure whether fir.declare will also face similar issues.

316

The first issue here is that %res is not accessible after the call where it needs to be used. It could be propagated back, but in general, this means that any SSA values created during the association lifetime is unusable, even if it is not linked to toe variable lifetime. And I find this very restrictive hard to work with.

Yes, that is right particularly after mem2reg kind of passes. But for load-store kind of code that lowering generates, it should not be a problem.

Thanks @jeanPerier for the quick reply. A couple of comments or questions inline.

So I guess hlfir.expr<T> type is the only new type and is expected to work only with hlfir. Otherwise hlfir is expected to work with existing fir types?

Yes, HLFIR will work with the existing FIR types (it has to, since it is not adding any types for variables). And appart from the hlfir.forall, hlfir doe not contain any construct like operation, it is all about expressions and assignments, so it lowering will generate FIR + HLFIR.

flang/docs/HighLevelFIR.md
73

In the symbol table in lowering, it will be bound to the fir.declare result.
The goal is that it is possible to emit high level operation (e.g. hlfir.assign) with the SSA value bound to front-end symbol.

Regarding createHostAssociateVarClone, yes, it might need to emit a new fir.declare so that the new entity can be properly binded.

96

I am assuming there will be changes required for our MemRefDataFlowOpt pass to account for the fir.declare operation.

Yes, that seems right to me. I do not think we are running this pass currently. Regardless, fir.declare should allow getting a higher level view in possible aliasing/side-effects affecting an address, and should help getting the store to load forwarding working again safely.

I am also reminded of llvm's dbg intrinsics (https://llvm.org/docs/SourceLevelDebugging.html#id13). These intrinsics track variable information and includes a llvm.dbg.declare intrinsic which is deprecated. Not sure whether fir.declare will also face similar issues.

Thanks for this pointer. I see fir.declare as being a bit different than llvm.dbg.declare in the sense that fir.declare will give context to its output address, but it will have no impact on the semantics of its input.

You could very well imagine implementing storage sharing with fir.declare:

subroutine foo()
  integer :: i, j
  ! some code using i but not j

  ! some code using j but not i
end subroutine

After some optimization pass, you could have:

func.func @_QPfoo() {
  %storage = fir.alloca i32
  %i = fir.declare %stoarge {fir.def = _QPfooEi}
  %j = fir.declare %stoarge {fir.def = _QPfooEj}
   // ... use %i
   // ... use %j
}

llvm.dbg.declare also seems to require that there can only be one per variable. Which makes some optimization where a variable may live in different storage/register during its lifetime harder.

I do not think the same will be true of fir.declare. It requires its fir.def tag to be uniq so that the defining op is always accessible in hlfir ops using it, but a same tag could point to the same Fortran variable.
You could have:

%storage1 = fir.alloca i32
%i_1 = fir.declare %storage  {fir.def = _QPfooEi}
// use %i_1.... debugger should be told "i" is in %i_1
// for some reason decide that "i" is now tracked as an element in some buffer:
%buffer = fir.alloca i32, 10
%storage2 = fir.coordinate_of %buffer, c3
 %i_2 = fir.declare %storage  {fir.def = _QPfooEi.opt1}
// use %i_2. debugger should be told "i" is in %i_2

I think this would translate in llvm.dg.addr being emitted when changing the storage, but this deserves more design work about debug info generation.

At least from an HLFIR/FIR point of view, I see no issue with the two patterns above (two fir.declare sharing the same input, or two fir.declare referring to the same Fortran variable). I think fir.declare, despite the name, is more akin to llvm.dbg.addr, and that one big advantage it has over both is that it returns an SSA value, while the other are tagging an SSA existing value. That makes variable info identifiable via the defining operations chain of an address, rather than via an analysis of its usages.

Thanks @jeanPerier. I don't have any further questions or comments. LGTM.

This revision is now accepted and ready to land.Oct 12 2022, 4:27 AM