This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][GPU] Properly model step in parallel loop to gpu conversion.
ClosedPublic

Authored by herhut on Feb 24 2020, 7:34 AM.

Details

Summary

The original patch had TODOs to add support for step computations,
which this commit addresses. The computations are expressed using
affine expressions so that the affine canonicalizers can simplify
the full bound and index computations.

Also cleans up the code a little and exposes the pass in the
header file.

Diff Detail

Event Timeline

herhut created this revision.Feb 24 2020, 7:34 AM
herhut updated this revision to Diff 246410.Feb 25 2020, 3:30 AM

Simplify afffine constant matching.

bondhugula added inline comments.
mlir/test/Conversion/LoopsToGPU/parallel_loop.mlir
230

Side question: where aren't we using affine.load/store instead of load/store and loop.parallel -> affine.parallel here? With the former, you'll get things like store to load fwd'ing, redundant load elimination, composition of ops supplying subscript values into the load/store itself, etc., infra for all of which exist and whenever you need them. All the mapping metadata should nicely fit into affine.parallel as well.

ftynse accepted this revision.Feb 25 2020, 4:18 AM
ftynse added inline comments.
mlir/lib/Conversion/LoopsToGPU/LoopsToGPU.cpp
537

Nit: could you early-return instead?

655

Nit "we use affine apply" ?

mlir/test/Conversion/LoopsToGPU/parallel_loop.mlir
18

Let's not match on attribute names here, i.e. prefer #[[MAP0:.*]] = affine_map<...

This revision is now accepted and ready to land.Feb 25 2020, 4:18 AM
herhut updated this revision to Diff 246414.Feb 25 2020, 4:36 AM
herhut marked 4 inline comments as done.

Review comments.

Thanks for the review!

mlir/test/Conversion/LoopsToGPU/parallel_loop.mlir
230

It is not pure coincidence that the mapping data fits :)

My hope is that this mapper will work equally well with affine.parallel. However, I do not want to restrict it to affine and currently the code we feed into this is not based on affine.parallel. I expect that we will generalize things in that direction eventually but would also be very happy if someone else looks into that.

This revision was automatically updated to reflect the committed changes.
bondhugula added inline comments.Feb 25 2020, 7:53 AM
mlir/test/Conversion/LoopsToGPU/parallel_loop.mlir
230

However, I do not want to restrict it to affine and currently the code
we feed into this is not based on affine.parallel. I expect that we will

"The code we feed into this": is the thing that's generating the loop.parallel's available somewhere or is it something that's planned for release in the future?

generalize things in that direction eventually but would also be very
happy if someone else looks into that.

But in order to do that one would also have to look at that converter that's generating the loop dialect ops and switch it to affine dialect ones. IMO, that'd prevent duplication of a lot of infrastructure in less powerful ways. All of these examples can be represented and transformed (whether or not you need any analysis) with the affine dialect.