This is an archive of the discontinued LLVM Phabricator instance.

[flang] Support multidimensional reductions in SimplifyIntrinsicsPass.
ClosedPublic

Authored by vzakhari on Sep 13 2022, 5:26 PM.

Details

Summary

Create simplified functions for each rank with "x<rank>" suffix
that implement multidimensional reductions. To enable this I had to fix
an issue with taking incorrect box shape in cases of sliced embox/rebox.

Depends on D133818

Diff Detail

Event Timeline

vzakhari created this revision.Sep 13 2022, 5:26 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 13 2022, 5:26 PM
vzakhari requested review of this revision.Sep 13 2022, 5:26 PM

Again, thanks for the work here. It looks roughly like I had imagined [although I was kind of wishfully thinking that maybe we could just treat ND as 1D, just larger - at least for things where "ordering isn't important", such as sum or maxval]

flang/lib/Optimizer/Transforms/SimplifyIntrinsics.cpp
136

Not that it matters a huge amount, but isn't Fortran limiting number of dimensions to 6? And thus asking for 16 is a bit excessive?

138

Nit: the "rank must be positive" is a bit misleading for an unsigned value. :)

177

Would it make more sense to reverse this before the first for-loop (line 161), and then reverse it again before the second loop - makes it easier to read the for-loop, if nothing else.

194

Do we not need to do some "combine" here? I will give it a try in a bit, to see if I can prove this hypothesis.

vzakhari added inline comments.Sep 14 2022, 8:33 AM
flang/lib/Optimizer/Transforms/SimplifyIntrinsics.cpp
136

Flang's limit is actually 15, so I will change it to 15.

138

"Positive" does not include zero, so the message should be correct :) I will change it to "rank cannot be zero".

177

Sorry, I am not getting it. It is empty before the first for-loop (line 161), so there is nothing to reverse.

194

The "combine" happens via the loop block arguments. Basically, we may think of an inner loop as a function that takes the reduction value that is an input for the outer loop, and the result of the "function" is passed back to the outer loop.

vzakhari updated this revision to Diff 460113.Sep 14 2022, 8:39 AM

Again, thanks for the work here. It looks roughly like I had imagined [although I was kind of wishfully thinking that maybe we could just treat ND as 1D, just larger - at least for things where "ordering isn't important", such as sum or maxval]

If you mean that we could have used a single loop vs the loop nest, then I do not think it is easily possible. fir.coordinate_of provided with the multiple indices accounts for the dimension strides (which can be anything in case of slicing) - that we would have to do manually if using a single loop. Using the loop nest looks much cleaner to me and consistent with the rest of FIR generation.

Again, thanks for the work here. It looks roughly like I had imagined [although I was kind of wishfully thinking that maybe we could just treat ND as 1D, just larger - at least for things where "ordering isn't important", such as sum or maxval]

If you mean that we could have used a single loop vs the loop nest, then I do not think it is easily possible. fir.coordinate_of provided with the multiple indices accounts for the dimension strides (which can be anything in case of slicing) - that we would have to do manually if using a single loop. Using the loop nest looks much cleaner to me and consistent with the rest of FIR generation.

Also importantly, multidimensional iteration spaces and access coordinates are consistent with other aspects of MLIR, so more generally amenable to reusing the standard MLIR optimization passes rather than reinventing them.

vzakhari updated this revision to Diff 460131.Sep 14 2022, 9:09 AM

Fixed minor typo in a comment.

This revision is now accepted and ready to land.Sep 19 2022, 11:17 AM