This is an archive of the discontinued LLVM Phabricator instance.

Prepare for inlining of SUM intrinsic
ClosedPublic

Authored by Leporacanthicus on May 11 2022, 11:48 AM.

Details

Summary

Find calls to FortranASum{Real8,Integer4}, check for dim and mask
arguments being absent - then produce an inlineable simple
version of the sum function.

(No longer a prototype, please review for push to llvm/main - not sure how to make Phabricator update the review with actual commit message)

Diff Detail

Event Timeline

Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
Leporacanthicus requested review of this revision.May 11 2022, 11:48 AM

Just some quick skimming through, I had a few comments, hope this helps! :)

flang/lib/Optimizer/Transforms/InlineIntrinsics.cpp
41 ↗(On Diff #428733)

This line is unnecessary (also a helper like this does not really pull its weight: RAUW+erase is a common pattern)

48 ↗(On Diff #428733)

I think that technically this is a "partial_specialization", that would be a better suffix than "_inline" I believe.

Also Twine as local variable isn't recommended, it is a footgun.
You also don't gain anything since you will convert to string always below anyway, this should be equally fine:

std::string name = mlir::Twine{basename, "_inline"}.str();
52 ↗(On Diff #428733)

Same as the call-site: a symbol-table should be built once and kept around.

120 ↗(On Diff #428733)

The function does not do what the name says it does I believe, it looks at an operand.

122 ↗(On Diff #428733)

Can't val.getDefiningOp() return nullptr?

126 ↗(On Diff #428733)

(same as above)

129 ↗(On Diff #428733)

The usual way to match for zero would be:

return (matchPattern(value, m_Zero()))
136 ↗(On Diff #428733)

The two initializations here could move out of the walk (they're not free) and replaced by setting the insertion point on the builder.

137 ↗(On Diff #428733)

Nit: reduce indentation with early return/continue (here and elsewhere/below).

140 ↗(On Diff #428733)

This will be slow, a SymbolTable should be constructed on the module outside of the walk (and this method is not a good idea on the builder: it encourages anti-patterns like here)

Edit: actually this is even entirely unnecessary, you only need "callee".

141 ↗(On Diff #428733)

This is the same thing as "callee"

142 ↗(On Diff #428733)

Nit: specify the type instead of auto (here and elsewhere where it isn't redundant)

154 ↗(On Diff #428733)

Can you get the type from the IR? That is using the type of one of the operands or the result instead of doing string matching and rebuilding a type?

157 ↗(On Diff #428733)

(technically you can modify the call in-place, but this is fine as well.

Leporacanthicus marked an inline comment as not done.May 11 2022, 2:24 PM

Just some quick skimming through, I had a few comments, hope this helps! :)

Appreicate the comments. This is definitely a hacky version to show the basic idea. I'm sure there will be much bigger changes than the things you've pointed out, but I will try to make the changed you suggested too.

A lot of it is "How the heck do I do this", and at least in Flang, we don't have much "modify MLIR based on finding something". This is definitely the most complicated [even if it's really quite trivial] that I've done in this kind of thing. And yes, the various things may return NULL or similar, and not properly checked is definitely part of "this is not finished".

Appreciate that you spent the time to comment on this.

(I didn't make it explicit, but the overall approach makes sense to me)

peixin added a subscriber: peixin.Jun 18 2022, 3:12 AM
peixin added inline comments.
flang/lib/Optimizer/Transforms/InlineIntrinsics.cpp
89 ↗(On Diff #428733)

There should be one lambda with std::function argument. In lambda, you create the fir::DoLoopOp and handle the loop info. Or even make it one general function so that it can be used for other intrinsics. In the caller, do the special operations as the intrinsic needed. Here is one example (D127047), which borrows the code from FIR. As I remembered, there is some code handing the fir::DoLoopOp. You can check if there are some helper functions you can use directly. With one good lambda/helper function, you can even do this for multiple-D arrays.

Leporacanthicus marked 5 inline comments as done.

Updates in this revision:

  • Implement shape detection (so only work on 1D arrays)
  • Fix off-by-one error in loop (found when testing on SNAP)
  • Fix some review comments
  • Rebase to llvm/main instead of fir-dev

Still to resolve:

  • Tests
  • Optimisation setting to enable/disable
  • Probably fix some more review comments
Leporacanthicus marked 5 inline comments as done.Jun 23 2022, 9:41 AM
Leporacanthicus added inline comments.
flang/lib/Optimizer/Transforms/InlineIntrinsics.cpp
41 ↗(On Diff #428733)

Doh, missed the "don't need this line". Will fix in the next update.

48 ↗(On Diff #428733)

For now, I've changed it to "_simplfiied", which should be good enough to make it [reasonably] unique. Yes, it's a partial specialization - or a simplification compared to the complete runtime function... :)

52 ↗(On Diff #428733)

I'm not entirely sure precisely what you are asking for here. To keep a separate name -> function mapping (DenseMap or similar), or just keep some variable around in some other fashion?

89 ↗(On Diff #428733)

I agree that the future design will have some form of split between a generic "create a new function" and "build the content for a particular intrinsic".

I think the right time to make that split is when we add a different of intrinsic for simplification, and we can better see what is shared and what isn't shared at that point. There may be more than one such split.

136 ↗(On Diff #428733)

For some reason, it doesn't work when I move the builder out of the loop, it crashes on getNamedFunction(callee). Not sure if I'm doing something wrong. I moved the builder creation so that it only gets created when it's actually required.

141 ↗(On Diff #428733)

I've rewritten the whole section of this code... Thanks for the "nudge" to rewrite it.

142 ↗(On Diff #428733)

I _think_ I've removed all auto except where it's obvious.

154 ↗(On Diff #428733)

Not done this for now. I think it's a good idea, so I will have a play with it over the next day or so [or hours, if it turns out real easy].

Hey Mats, thanks for working on this!

This is a prototype, not a finished version

Is this still the case? If not - could you update the summary? Folks might be hesitant to review otherwise.

Before diving into details, could you add some tests? Otherwise, it's not clear what the intended workflow for this is. Also, there might be some design details still requiring ironing out and I think that tests would clarify this.

Leporacanthicus marked 3 inline comments as done.

Updates since last time:

  • Fix crash if no shape available.
  • Add basic tests

Mostly looks good to me. I've left some suggestions inline. The key thing would be to add more tests. And to update the description. In particular:

This is a prototype, not a finished version,

I'd add more reviewers once the summary is updated. Thanks for working on this!

flang/include/flang/Optimizer/Transforms/Passes.td
177–183

This name and description doesn't really reflect what is going on here, does it? This pass is more about injecting a specializations of intrinsics. In fact, it only deals with one intrinsic ATM :)

flang/lib/Optimizer/Transforms/SimplifyIntrinsics.cpp
35

A comment with some pseudo-code for the newly created function would be greatly appreciated!

38

Suggesting a more descriptive ... name :)

43–45
49

[nit]

166

Could you write a test for this case as well?

168–169

Could you write a test that will exercise this?

flang/test/Transforms/simplifyintrinsics.fir
5

Could you use more descriptive names for functions? For example, sum_1d_array.

The main difference between the two tests is that one adds vectors, and the other adds 2d arrays, right? Why not make it clearer (as opposed to cryptic names like _QPtest.

36

You will also want CHECK-NOT: _FortranASumInteger4.

91

Could you add // CHECK-NOT: _FortranASumInteger4_simplified somewhere here?

Leporacanthicus edited the summary of this revision. (Show Details)Jul 22 2022, 11:26 AM

FYI, I will be unavailable next week, so I will not be able to follow up on your comments.

flang/lib/Optimizer/Transforms/SimplifyIntrinsics.cpp
28

Since this function is currently SUM specific, can you please rename it to make it explicit? Please also add a header comment describing the parameters.

82

Is it possible that the input box has non-unit stride?

97

Is there some benefit in keeping the sum in memory? Can we make it a result of the loop and store it once at the end? No need to change it in this commit.

149

Please add comments describing either the pass in general or runOnOperation behavior.

clementval added inline comments.
flang/include/flang/Optimizer/Transforms/Passes.td
177

Is there a specific reason to run on ModuleOp instead of FuncOp?

177–183

+1

Leporacanthicus marked 10 inline comments as done.

Update based on review:

  • Add comments to code, change variable names.
  • Add two more tests to cover all branches of the code.
flang/include/flang/Optimizer/Transforms/Passes.td
177

Probably no good reason - although I'm not sure it makes a huge difference if it is or isn't - we need to walk all the code either way.

I tried changing it quickly, but it needs further changes than just changing the FuncOp, so I will look at changing it in the next update - I'm sure there will be at least one more iteration before this is approved - if nothing else because I've "tixed" this.

flang/lib/Optimizer/Transforms/SimplifyIntrinsics.cpp
28

The idea is that this will take a function as argument, and be made generic, but trying to keep it simple for the first generation of this code ("don't complicate the API for something you could need in the future")

35

Done, in pseudo-Fortran, next to where the actual function is inserted - it should probably be split into a separate function at some point (when we start adding more functions) in the future, so I prefer to have it next to the code that inserts the code.

38

Done, but with a different var name.

82

I will dig into this, I need a way to produce it first.

97

Good point, I will look at this in a follow-up patch (once the code is optimised by LLVM, it makes no real difference - and I suspect it ends up with an alloca + load/add/store in LLVM either way, but it makes sense to use the result of the do-loop, to keep things consistent).

149

Description of the entire pass - as well as brief description of the actual checks below.

clementval added inline comments.Jul 26 2022, 9:28 AM
flang/include/flang/Optimizer/Transforms/Passes.td
177

The advantage is that if you can run on FuncOp, multiple FuncOp can be processed in parallel.

Thanks for all the updates!

flang/lib/Optimizer/Transforms/SimplifyIntrinsics.cpp
9
181

I'm a bit unsure about this condition. What's args[3] and what's args[4]? And why isZero(args[3])?

flang/test/Transforms/simplifyintrinsics.fir
38

What if fir.call @_FortranASumInteger4({{.*}}) happens before fir.call @_FortranASumInteger4_simplified? Same for other instances of CHECK-NOT.

clementval added inline comments.Jul 26 2022, 10:37 AM
flang/include/flang/Optimizer/Transforms/Passes.td
177

And you can probably run directly on Operation since each call can be dealt with individually.

Leporacanthicus marked an inline comment as done.

Updates to review comments:

  • Add a few more comments, and update others.
  • Add extra check-not lines.
  • Change from mlir::ModuleOp to mlir::func::FuncOp for pass.
flang/include/flang/Optimizer/Transforms/Passes.td
177

I tried to use fir::CallOp, but it doesn't recognize that. If I try to include FIROps.h so it does recognize it, lots of errors occur, and I couldn't make that work at all. I tried mlir::Operation too, but I don't think that helps much, and I couldn't make it work.

I have updated to use `mlir::func::FuncOp' for now.

177

Does that mean you have to worry about locks and things updating the module? Or is that taken care of in some magic way? Or did I misunderstand "in parallel"?

flang/lib/Optimizer/Transforms/SimplifyIntrinsics.cpp
181

I've added another comment with the SUM C++ runtime prototype declaration, and added two variables to give names to args[3] and args[4]. As well as a comment to explain why args[1] and args[3] are ignored.

flang/test/Transforms/simplifyintrinsics.fir
38

Well, I guess I can add one before. But then what if it's two lines away? Or three? After all CHECK_NOT only checks between THIS and the NEXT line. Maybe the right thing is to add another run line using a NEWPREFIX, and ONLY have NEWPREFIX-NOT lines?

I don't know the right answer here, so I have just put one more CHECK-NOT line in each place.

clementval added inline comments.Jul 28 2022, 1:03 AM
flang/include/flang/Optimizer/Transforms/Passes.td
177

If you modify parent operations (outside of the function) then you might have problem.

You have more information here: https://mlir.llvm.org/docs/PassManagement/#operation-pass

Thanks for the updates and addressing my comments. I've left a few more, but in general this looks good to me and I will approve once these are addressed too.

flang/include/flang/Optimizer/Transforms/Passes.td
177–183

I think that it would make the description a bit more specific. Perhaps:

Qualifying intrinsic calls are replaced with calls to specialised and simplified intrinsic routines, which are inserted into the current module during the pass execution.

Or something similar :)

flang/lib/Optimizer/Transforms/SimplifyIntrinsics.cpp
182

SUM is not simplified here ;-) Instead, a call to SUM is replaced with a call to a simplified and specialised version of SUM.

206–207

It would be worth adding a test with two calls to SUM. This way you can verify that _FortranASumInteger4_simplified is only "injected" once.

Leporacanthicus marked 2 inline comments as done.

Updated test to cover "two calls in same module".

Amended a few comments.

Leporacanthicus marked an inline comment as done.Jul 28 2022, 12:44 PM
Leporacanthicus added inline comments.
flang/include/flang/Optimizer/Transforms/Passes.td
177

Thanks for the link. As per Slack discussion, to avoid breaking the rules, and adding functions to the module, this should be a mlir::ModuleOp pass - a comment is added to clarify this in the file too.

awarzynski accepted this revision.EditedJul 29 2022, 1:18 AM

LGTM, thanks again for working on this! Please wait for other active reviewers to approve as well before merging.

This revision is now accepted and ready to land.Jul 29 2022, 1:18 AM

LGTM, minor comments. Please let me know what you think!

flang/lib/Optimizer/Transforms/SimplifyIntrinsics.cpp
49

Two small comments -

  1. Can we make the function name more descriptive (getOrCreateSumFunction)?
  2. Also, it would be nice if we could return null value when this function is called requesting some other function (indicating that we do not generate that function yet). For example, if this function is called on max intrinsic, we will return implementation of sum and it would be nice if we could avoid that. Is it reliable to check the basename is to make this decision and return null?

Added comment to clarify generic function name.
Added assert to catch "wrong use" of function.
Added new test for strided array values.

peixin accepted this revision.Jul 30 2022, 2:09 AM

LGTM. Some nit comments.

BTW, do you consider add this option to the driver flang-new?

flang/include/flang/Optimizer/Transforms/Passes.td
179
flang/lib/Optimizer/Transforms/SimplifyIntrinsics.cpp
17

It may need multiple loops for multi-D array.

133

else if (integer type)

res = ...

else

TODO
208

Do you consider match the data type instead of matching the string? https://github.com/llvm/llvm-project/blob/7912b1f8e7c845a97411cbfc176db56861cdf116/flang/lib/Optimizer/Builder/Runtime/Reduction.cpp#L864-L902

It's ok to me if you want keep the current method. You can also consider this when expand the inline to more data types. Anyway, this is not hard comment and fell free to ignore this.

flang/test/Transforms/simplifyintrinsics.fir
38

No need to add CHECK-NOT after _FortranASumInteger4_simplified.

98

This is not necessary to add it after _FortranASumInteger4. The same for the following.

Leporacanthicus added inline comments.Aug 1 2022, 3:09 AM
flang/lib/Optimizer/Transforms/SimplifyIntrinsics.cpp
49

So, my idea is that this will be changed when we add support for more functions, to take a function (lambda or similar) that actually fills in the function, and that ALL generation of new functions go through this one function.

I have added a comment to explain that, and I have also added an assert to check the name string, just in case someone believes the funcition will magically work for PRODUCT, MAXVAL or some other intrinsic.

Leporacanthicus marked 2 inline comments as done.

Update for review comments:

  • add TODO to mark "this is not supported"
  • clarify comment that number of loops may depend on dimensions
flang/lib/Optimizer/Transforms/SimplifyIntrinsics.cpp
208

I will change that in the next version, when also introducing support for at least one more function - assuming there's no problems from that.

I have started on a follow-on patch, which refactors things for the purpose of supporting multiple functions.

flang/test/Transforms/simplifyintrinsics.fir
98

Can you explain why? I added it because we want to make sure it's not adding a new call.

peixin added inline comments.Aug 1 2022, 6:03 PM
flang/lib/Optimizer/Transforms/SimplifyIntrinsics.cpp
208

Sounds good to me.

flang/test/Transforms/simplifyintrinsics.fir
98

If you use CHECK-NEXT-NOT: in line 95, it will check if the next string has _FortranASumInteger4(.... If you use CHECK-NOT in line 95, it will check if there is one string of fir.call @_FortranASumInteger4(... after line 94 to the end. Check here.

awarzynski added inline comments.Aug 2 2022, 2:38 AM
flang/test/Transforms/simplifyintrinsics.fir
98

@peixin I'm not sure that this would work here. The docs only say that:

The “CHECK-NOT:” directive is used to verify that a string doesn’t occur between two matches (or before the first match, or after the last match).

Which means that the following:

// CHECK-LABEL:   func.func @sum_2d_array_int
// CHECK-NOT:       fir.call @_FortranASumInteger4_simplified
// CHECK:           fir.call @_FortranASumInteger4

will reject this (desirable):

func.func @sum_2d_array_int {
  fir.call @_FortranASumInteger4_simplified 
  fir.call @_FortranASumInteger4

but will happily accept this (not desirable) :

func.func @sum_2d_array_int {
   fir.call @_FortranASumInteger4
   fir.call @_FortranASumInteger4_simplified

Here's a small reproducer that you can play with:

  • input.f90
aaa
bbb
ccc
ddd
  • experiment.f90
! RUN: cat %S/input.f90 | FileCheck %s

! CHECK: aaa
! CHECK-NOT: ccc
! CHECK: bbb
! CHECK: ddd

This test passes for me, which demonstrates that ! CHECK-NOT: ccc is not sufficient to protect against ccc appearing *after* bbb.

I think the best way to avoid duplicating CHECK-NOT would be this pattern:

! CHECK-DAG: <something expected>
! CHECK-NOT:       fir.call @_FortranASumInteger4_simplified
! CHECK-DAG: <something expected, e.g. return>

This way you can specify the there should be no calls to _FortranASumInteger4_simplified between two surrounding CHECK-DAG. But, IIUC, you can't mix it with CHECK-LABEL, so this it not available here.

Does this make sense to you or am I confusing everything and everyone? :)

peixin added inline comments.Aug 2 2022, 4:22 AM
flang/test/Transforms/simplifyintrinsics.fir
98

Right. My understanding was not correct. Thanks.

vzakhari added inline comments.Aug 2 2022, 1:01 PM
flang/lib/Optimizer/Transforms/SimplifyIntrinsics.cpp
28

I see. Thanks for explanation.

198

typo

234

nit, style: please try to make the anonymous namespace shorter (https://llvm.org/docs/CodingStandards.html#anonymous-namespaces)

Leporacanthicus marked 2 inline comments as done.Aug 4 2022, 2:46 AM
Leporacanthicus added inline comments.
flang/lib/Optimizer/Transforms/SimplifyIntrinsics.cpp
234

Thanks for pointing that out. I changed the inline to static to make sure they are not interfering with other functions that may have same/similar names.

vzakhari added inline comments.Aug 4 2022, 11:43 AM
flang/lib/Optimizer/Transforms/SimplifyIntrinsics.cpp
234

Thanks! LGTM

This revision was automatically updated to reflect the committed changes.