Page MenuHomePhabricator

Implement an scf.for range folding optimization pass.
ClosedPublic

Authored by Anthony on Jun 15 2021, 4:00 AM.

Details

Summary

Implement an scf.for range folding optimization pass.

In cases where arithmetic (addi/muli) ops are performed on an scf.for loops induction variable with a single use, we can fold those ops directly into the scf.for loop.

For example, in the following code:

scf.for %i = %c0 to %arg1 step %c1 {
  %0 = addi %arg2, %i : index
  %1 = muli %0, %c4 : index
  %2 = memref.load %arg0[%1] : memref<?xi32>
  %3 = muli %2, %2 : i32
  memref.store %3, %arg0[%1] : memref<?xi32>
}

we can lift %0 up into the scf.for loop range, as it is the only user of %i:

%lb = addi %arg2, %c0 : index
%ub = addi %arg2, %i : index
scf.for %i = %lb to %ub step %c1 {
  %1 = muli %0, %c4 : index
  %2 = memref.load %arg0[%1] : memref<?xi32>
  %3 = muli %2, %2 : i32
  memref.store %3, %arg0[%1] : memref<?xi32>
}

Diff Detail

Event Timeline

Anthony created this revision.Jun 15 2021, 4:00 AM
Anthony requested review of this revision.Jun 15 2021, 4:00 AM
Anthony edited the summary of this revision. (Show Details)Jun 15 2021, 4:06 AM
Anthony added a reviewer: mehdi_amini.
ftynse added a subscriber: ftynse.Jun 15 2021, 5:16 AM
ftynse added inline comments.
mlir/include/mlir/Dialect/SCF/Passes.td
49

This can be a function pass. Function passes can run in parallel on different functions.

50

This shouldn't be empty.

mlir/lib/Dialect/SCF/Transforms/LoopRangeFolding.cpp
2

Please add the license header.

21

We mark functions local to a translation module as static.

49

There is a version of OpBuilder::clone that takes a BlockAndValueMapping. It should allow you to create a copy of use operation without casting it to the specific type, and without manually calling lookupAndDefault.

mlir/test/Dialect/SCF/loop-range.mlir
129

Please add the newline

mravishankar requested changes to this revision.Jun 15 2021, 10:32 AM
mravishankar added a subscriber: mravishankar.

Please use clang-format on the patch. There seems to be a lot of linter errors.

mlir/include/mlir/Dialect/SCF/Passes.h
38

Missing comment about the pass description.

mlir/lib/Dialect/SCF/Transforms/LoopRangeFolding.cpp
36

Simpler to just use indVar.getUser().

Also naming Nit. use typically refers to the OpOperand & in the operation using the value. The Operation using the value is typically called user.

88

I think if you want to do a fixed point iteration it might be better to just redo this as a pattern on scf.for operations.

The pattern can fail if the induction variable has more than one use, or if it is used in anything other than an add or a mul. If that is not the case you can,

  1. Create a new scf.for operation with different lb, ub
  2. Use inlineRegionBefore to move the body of the original scf.for as a region of the new op.
  3. Replace uses of the original add/mul with the induction variable.
mlir/test/Dialect/SCF/loop-range.mlir
19

Convention on checks is to use %[[ARG0:.*]] instead of [[ARG0:%.*]]. It makes some of the other checks easier to write. For example, when you want to match [%v] . With %[[ARG0:.*]], ARG0 is set to v. So you can just do [%[[ARG0]]].
OTOH when using [[ARG0:%.*]] you will need to match [[[ARG0]]]. The [[[ ]]] conflicts with the check parsing.

This revision now requires changes to proceed.Jun 15 2021, 10:32 AM
Anthony updated this revision to Diff 352419.Jun 16 2021, 6:31 AM
Anthony marked 4 inline comments as done.

Requested changes, minus the rewrite pattern.

Updating D104289: First crack at foor loop range folding pass.

Is there any reason to make this separate pass instead of ForOp canonicalization? Also, diff looks incomplete

Anthony accepted this revision.EditedJun 18 2021, 9:49 AM
Anthony marked 4 inline comments as done.

I have made the changes requested, with a few inline comments to ask for further clarification.

(If I was supposed to submit as a planed revision or something else instead, my apologies)

mlir/lib/Dialect/SCF/Transforms/LoopRangeFolding.cpp
36

I didn't see a getUser for Value, only getUsers, am I missing something?

49

Thanks for the tip. I used your suggested OpBuilder::clone method to simplify a bit. However, I still need to distinguish between AddIOp and MulIOp because the former does not fold into the for loop step, while the latter does.

88

Thanks for the comments. So I can rewrite this into a rewrite pattern, but I have one question: Does a rewrite pattern on ForOp guarantee that the innermost loop will get rewritten first? I believe that is an invariant of the transform as well.

I can come up with a test case that better demonstrates the snag I hit without the inner-most transform.

Anthony marked 5 inline comments as done.Jun 18 2021, 9:53 AM

It looks like the diff was updated the wrong way: now it only contains the difference between the initial version and the updated version, rather than the difference between the updated version and the mainline. If you created a separate git commit for the changes, you need to do arc diff HEAD^^ (as many ^ as commits in your branch) or arc diff <hash-of-the-commit-on-the-main-branch>.

Anthony updated this revision to Diff 353335.EditedJun 21 2021, 4:18 AM

Sorry about that, I believe I have it correct now.

ftynse accepted this revision.Jun 21 2021, 4:25 AM
ftynse added inline comments.
mlir/test/Dialect/SCF/loop-range.mlir
2

We prefer to only test one pass unless strictly necessary otherwise. Helps debugging.

Is there any reason to make this separate pass instead of ForOp canonicalization?

Was this answered?

Also please update the title to be descriptive, and update the description to reflect what the commit message will be.

Anthony added a comment.EditedJun 21 2021, 12:55 PM

Is there any reason to make this separate pass instead of ForOp canonicalization?

Was this answered?

Also please update the title to be descriptive, and update the description to reflect what the commit message will be.

Sorry, I missed this. I followed the pattern for the loop invariant code motion pass, which is not a canonicalization on for. For this choice, I actually am not sure whether this would be better off as a canonicalization, or as its own pass. Can I have some guidance here?

(Will update description and commit messages).

Thinking about canonicalization, in the past I've been working with a canonical form for loops that would try to form the iteration space so that it starts at 0 and has a step increment of one, which may be the opposite of this transformation.

So having this as a separate pass makes sense, but I'd be cautious and keep in mind that this may just be as a lowering step which could be undone by canonicalization.

mlir/lib/Dialect/SCF/Transforms/LoopRangeFolding.cpp
47
63–64

(keep auto for when it improves readability and/or it is obvious from the context, same below in this patch)

Can you fix the foor -> for typo in the revision title? Also "scf for" in the title will be useful.

Anthony updated this revision to Diff 353590.Jun 22 2021, 3:04 AM

Implement an scf.for range folding optimization pass.

In cases where arithmetic (addi/muli) ops are performed on an scf.for
loops induction variable with a single use, we can fold those ops
directly into the scf.for loop.

For example, in the following code:

scf.for %i = %c0 to %arg1 step %c1 {
  %0 = addi %arg2, %i : index
  %1 = muli %0, %c4 : index
  %2 = memref.load %arg0[%1] : memref<?xi32>
  %3 = muli %2, %2 : i32
  memref.store %3, %arg0[%1] : memref<?xi32>
}

we can lift %0 up into the scf.for loop range, as it is the only user
of %i:

%lb = addi %arg2, %c0 : index
%ub = addi %arg2, %i : index
scf.for %i = %lb to %ub step %c1 {
  %1 = muli %0, %c4 : index
  %2 = memref.load %arg0[%1] : memref<?xi32>
  %3 = muli %2, %2 : i32
  memref.store %3, %arg0[%1] : memref<?xi32>
}
Anthony marked 3 inline comments as done.Jun 22 2021, 3:10 AM

Thinking about canonicalization, in the past I've been working with a canonical form for loops that would try to form the iteration space so that it starts at 0 and has a step increment of one, which may be the opposite of this transformation.

So having this as a separate pass makes sense, but I'd be cautious and keep in mind that this may just be as a lowering step which could be undone by canonicalization.

Thanks, got it. I'm happy to change the pass in the future if it turns we need it as a canonicalization.

I've also made the requested changes, and added another test case (see description in loop-range.mlir).

I believe there is still the question of whether or not this is better off as a rewrite pattern per @mravishankar comments.

Anthony retitled this revision from First crack at foor loop range folding pass. to Implement an scf.for range folding optimization pass..Jun 22 2021, 5:48 AM
mehdi_amini accepted this revision.Jun 22 2021, 1:04 PM

LGTM as-is, with a few comments.

mlir/include/mlir/Dialect/SCF/Passes.td
49

Can you actually make it an Operation pass: there is no need to anchor this to a function I believe.

mlir/lib/Dialect/SCF/Transforms/LoopRangeFolding.cpp
86

(After you make it an Operation pass, this will be just getOperation()->walk()

87

foldRanges can't fail, can you make it return void?
At this point I would inline the body here as well.

Oh and if you can update the description here to match what will be the commit description please.

LGTM as-is, with a few comments.

I can change it to operation pass---originally I had it as one, but changed it to a function pass at the request of another reviewer @ftynse so that it can be run in parallel on functions. Do we still want to go back to operation pass?

(I'll update description as well).

LGTM as-is, with a few comments.

I can change it to operation pass---originally I had it as one, but changed it to a function pass at the request of another reviewer @ftynse so that it can be run in parallel on functions. Do we still want to go back to operation pass?

(I'll update description as well).

An OperationPass is strictly better, because it can run on different operation types (think different kinds of FuncOp). Being an OperationPass doesn't change how the parallelization works, because the pass should still be scheduled per FuncOp. The only potential reason why parallelization would be hurt, is if a user accidentally schedules it on a ModuleOp (which is allowed for generic OperationPass').

To extend on what River describes with an example:

-pass-pipeline='func(for-loop-range-folding)' -> runs in parallel on function
`-pass-pipeline='module(for-loop-range-folding)' -> won't run in parallel.

The point is that making it an operation pass also allows to do:

-pass-pipeline='gpu.func(for-loop-range-folding)' -> run in parallel on GPU functions!

Anthony updated this revision to Diff 353894.Jun 23 2021, 2:07 AM

Implement an scf.for range folding optimization pass.

In cases where arithmetic (addi/muli) ops are performed on an scf.for
loops induction variable with a single use, we can fold those ops
directly into the scf.for loop.

For example, in the following code:

scf.for %i = %c0 to %arg1 step %c1 {
  %0 = addi %arg2, %i : index
  %1 = muli %0, %c4 : index
  %2 = memref.load %arg0[%1] : memref<?xi32>
  %3 = muli %2, %2 : i32
  memref.store %3, %arg0[%1] : memref<?xi32>
}

we can lift %0 up into the scf.for loop range, as it is the only user
of %i:

%lb = addi %arg2, %c0 : index
%ub = addi %arg2, %i : index
scf.for %i = %lb to %ub step %c1 {
  %1 = muli %0, %c4 : index
  %2 = memref.load %arg0[%1] : memref<?xi32>
  %3 = muli %2, %2 : i32
  memref.store %3, %arg0[%1] : memref<?xi32>
}
Anthony edited the summary of this revision. (Show Details)Jun 23 2021, 2:09 AM
Anthony edited the summary of this revision. (Show Details)
Anthony marked 3 inline comments as done.Jun 23 2021, 2:16 AM

Thanks for the clarifications guys. I've gone ahead and switched it to an operation pass, and inlined the foldRanges function into the operation walk.

Thanks for the clarifications guys.

As a nit you likely intend to use "guys" in a neutral way, but nowadays it is quite controversial. It may be better to avoid it.

I've gone ahead and switched it to an operation pass, and inlined the foldRanges function into the operation walk.

Thanks! LG, do you need someone to land this for you?

Thanks for the clarifications guys.

As a nit you likely intend to use "guys" in a neutral way, but nowadays it is quite controversial. It may be better to avoid it.

Thanks for pointing that out---I'll be more conscientious about my wording in the future.

I've gone ahead and switched it to an operation pass, and inlined the foldRanges function into the operation walk.

Thanks! LG, do you need someone to land this for you?

Yes please! This is my first commit.

This revision was not accepted when it landed; it landed in state Needs Review.Jun 23 2021, 6:07 PM
This revision was automatically updated to reflect the committed changes.