Page MenuHomePhabricator

Prepare for inlining of SUM intrinsic
Needs ReviewPublic

Authored by Leporacanthicus on May 11 2022, 11:48 AM.



Find calls to FortranASum{Real8,Integer4}, check for dim and mask
arguments being absent - then produce an inlineable simple
version of the sum function.

This is a prototype, not a finished version, for the purpose
of defining a proper mechanism for inlining intrinsic functions
as a general optimisation.

Diff Detail

Event Timeline

Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
Leporacanthicus requested review of this revision.May 11 2022, 11:48 AM

Just some quick skimming through, I had a few comments, hope this helps! :)

41 ↗(On Diff #428733)

This line is unnecessary (also a helper like this does not really pull its weight: RAUW+erase is a common pattern)

48 ↗(On Diff #428733)

I think that technically this is a "partial_specialization", that would be a better suffix than "_inline" I believe.

Also Twine as local variable isn't recommended, it is a footgun.
You also don't gain anything since you will convert to string always below anyway, this should be equally fine:

std::string name = mlir::Twine{basename, "_inline"}.str();
52 ↗(On Diff #428733)

Same as the call-site: a symbol-table should be built once and kept around.

120 ↗(On Diff #428733)

The function does not do what the name says it does I believe, it looks at an operand.

122 ↗(On Diff #428733)

Can't val.getDefiningOp() return nullptr?

126 ↗(On Diff #428733)

(same as above)

129 ↗(On Diff #428733)

The usual way to match for zero would be:

return (matchPattern(value, m_Zero()))
136 ↗(On Diff #428733)

The two initializations here could move out of the walk (they're not free) and replaced by setting the insertion point on the builder.

137 ↗(On Diff #428733)

Nit: reduce indentation with early return/continue (here and elsewhere/below).

140 ↗(On Diff #428733)

This will be slow, a SymbolTable should be constructed on the module outside of the walk (and this method is not a good idea on the builder: it encourages anti-patterns like here)

Edit: actually this is even entirely unnecessary, you only need "callee".

141 ↗(On Diff #428733)

This is the same thing as "callee"

142 ↗(On Diff #428733)

Nit: specify the type instead of auto (here and elsewhere where it isn't redundant)

154 ↗(On Diff #428733)

Can you get the type from the IR? That is using the type of one of the operands or the result instead of doing string matching and rebuilding a type?

157 ↗(On Diff #428733)

(technically you can modify the call in-place, but this is fine as well.

Leporacanthicus marked an inline comment as not done.May 11 2022, 2:24 PM

Just some quick skimming through, I had a few comments, hope this helps! :)

Appreicate the comments. This is definitely a hacky version to show the basic idea. I'm sure there will be much bigger changes than the things you've pointed out, but I will try to make the changed you suggested too.

A lot of it is "How the heck do I do this", and at least in Flang, we don't have much "modify MLIR based on finding something". This is definitely the most complicated [even if it's really quite trivial] that I've done in this kind of thing. And yes, the various things may return NULL or similar, and not properly checked is definitely part of "this is not finished".

Appreciate that you spent the time to comment on this.

(I didn't make it explicit, but the overall approach makes sense to me)

peixin added a subscriber: peixin.Sat, Jun 18, 3:12 AM
peixin added inline comments.
89 ↗(On Diff #428733)

There should be one lambda with std::function argument. In lambda, you create the fir::DoLoopOp and handle the loop info. Or even make it one general function so that it can be used for other intrinsics. In the caller, do the special operations as the intrinsic needed. Here is one example (D127047), which borrows the code from FIR. As I remembered, there is some code handing the fir::DoLoopOp. You can check if there are some helper functions you can use directly. With one good lambda/helper function, you can even do this for multiple-D arrays.

Leporacanthicus marked 5 inline comments as done.

Updates in this revision:

  • Implement shape detection (so only work on 1D arrays)
  • Fix off-by-one error in loop (found when testing on SNAP)
  • Fix some review comments
  • Rebase to llvm/main instead of fir-dev

Still to resolve:

  • Tests
  • Optimisation setting to enable/disable
  • Probably fix some more review comments
Leporacanthicus marked 5 inline comments as done.Thu, Jun 23, 9:41 AM
Leporacanthicus added inline comments.
41 ↗(On Diff #428733)

Doh, missed the "don't need this line". Will fix in the next update.

48 ↗(On Diff #428733)

For now, I've changed it to "_simplfiied", which should be good enough to make it [reasonably] unique. Yes, it's a partial specialization - or a simplification compared to the complete runtime function... :)

52 ↗(On Diff #428733)

I'm not entirely sure precisely what you are asking for here. To keep a separate name -> function mapping (DenseMap or similar), or just keep some variable around in some other fashion?

89 ↗(On Diff #428733)

I agree that the future design will have some form of split between a generic "create a new function" and "build the content for a particular intrinsic".

I think the right time to make that split is when we add a different of intrinsic for simplification, and we can better see what is shared and what isn't shared at that point. There may be more than one such split.

136 ↗(On Diff #428733)

For some reason, it doesn't work when I move the builder out of the loop, it crashes on getNamedFunction(callee). Not sure if I'm doing something wrong. I moved the builder creation so that it only gets created when it's actually required.

141 ↗(On Diff #428733)

I've rewritten the whole section of this code... Thanks for the "nudge" to rewrite it.

142 ↗(On Diff #428733)

I _think_ I've removed all auto except where it's obvious.

154 ↗(On Diff #428733)

Not done this for now. I think it's a good idea, so I will have a play with it over the next day or so [or hours, if it turns out real easy].

Hey Mats, thanks for working on this!

This is a prototype, not a finished version

Is this still the case? If not - could you update the summary? Folks might be hesitant to review otherwise.

Before diving into details, could you add some tests? Otherwise, it's not clear what the intended workflow for this is. Also, there might be some design details still requiring ironing out and I think that tests would clarify this.