Page MenuHomePhabricator

[mlir][SCF] Peel scf.for loops for even step divison

Authored by springerm on Mon, Jul 12, 4:01 AM.



Add ForLoopBoundSpecialization pass, which specializes scf.for loops into a "main loop" where step divides the iteration space evenly and into an scf.if that handles the last iteration ("loop peeling").

This transformation is useful for vectorization and loop tiling. E.g., when vectorizing loads/stores, programs will spend most of their time in the main loop, in which only unmasked loads/stores are used. Only the in the last iteration (scf.if), slower masked loads/stores are used.

Subsequent commits will apply this transformation in the SparseDialect and in Linalg's loop tiling.

Diff Detail

Event Timeline

springerm created this revision.Mon, Jul 12, 4:01 AM
springerm requested review of this revision.Mon, Jul 12, 4:01 AM
springerm updated this revision to Diff 357892.Mon, Jul 12, 4:09 AM

expand documentation

springerm updated this revision to Diff 358119.Mon, Jul 12, 7:09 PM

fix bug when loop has no results

mehdi_amini added inline comments.Mon, Jul 12, 9:07 PM

The surrounding passes aren't well documented either, but can you fill the "description" field here?


Wouldn't this need potentially more than one iteration here? (up to step-1 I think?).

Also isn't this transformation just what the literature refers to as "loop peeling"?
If so then please name it according, including in a more descriptive commit title (like "Add a loop peeling pass to enable vectorization" or something like that).

springerm added inline comments.Mon, Jul 12, 11:06 PM

I think one iteration should be enough. The pattern is designed in such a way that it rounds down to next multiple of "step size". (Assuming lb = 0. In the more general case: newUb = ub - (ub - lb) % step.)

Renamed the pass name etc. to "loop peeling".

address comments

springerm retitled this revision from [mlir][SCF] Specialize scf.for loops for even step divison to [mlir][SCF] Peel scf.for loops for even step divison.Mon, Jul 12, 11:08 PM
springerm edited the summary of this revision. (Show Details)
mehdi_amini added inline comments.

Oh right because the body is already executing an entire (potentially partial) step..

bondhugula added inline comments.Wed, Jul 21, 8:24 PM

It should be possible to do this transformation without having to erase forOp. Could you do this in-place and so you won't need the output argument mainLoop. Operands of for ops can be updated. For eg. affine.for transformation utilities update the op in place and avoid erase/allocation wherever possible.

springerm marked an inline comment as done.Sat, Jul 31, 6:43 AM
springerm added inline comments.

Good idea.

springerm updated this revision to Diff 363292.Sat, Jul 31, 6:44 AM
springerm marked an inline comment as done.

address comments

Update: I removed the part that simplifies affine.min ops inside the peeled loop and put it into a separate commit (see commit stack). Also, the rewrite logic now utilizes FlatAffineConstraints, which makes the transformation more general and robust. (The old implementation was matching for a very specific kind of affine.min. The new implementation can handle various affine.min ops. See unit test of that commit for more details.)

nicolasvasilache accepted this revision.Mon, Aug 2, 1:11 AM

Nice, thanks for tackling this!

This revision is now accepted and ready to land.Mon, Aug 2, 1:11 AM
This revision was landed with ongoing or failed builds.Mon, Aug 2, 6:34 PM
This revision was automatically updated to reflect the committed changes.