This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/SCF/
-
mlir/
-
Dialect/
-
SCF/
1/1
Passes.h
3/3
Passes.td
-
lib/Dialect/SCF/Transforms/
-
Dialect/
-
SCF/
-
Transforms/
12/12
LoopRangeFolding.cpp
-
test/Dialect/SCF/
-
Dialect/
-
SCF/
3/3
loop-range.mlir

Differential D104289

Implement an scf.for range folding optimization pass.
ClosedPublic

Authored by Anthony on Jun 15 2021, 4:00 AM.

Download Raw Diff

Details

Reviewers

mehdi_amini
ftynse
mravishankar
Anthony

Commits

rG3f429e82d3ea: Implement an scf.for range folding optimization pass.

Summary

Implement an scf.for range folding optimization pass.

In cases where arithmetic (addi/muli) ops are performed on an scf.for loops induction variable with a single use, we can fold those ops directly into the scf.for loop.

For example, in the following code:

scf.for %i = %c0 to %arg1 step %c1 {
  %0 = addi %arg2, %i : index
  %1 = muli %0, %c4 : index
  %2 = memref.load %arg0[%1] : memref<?xi32>
  %3 = muli %2, %2 : i32
  memref.store %3, %arg0[%1] : memref<?xi32>
}

we can lift %0 up into the scf.for loop range, as it is the only user of %i:

%lb = addi %arg2, %c0 : index
%ub = addi %arg2, %i : index
scf.for %i = %lb to %ub step %c1 {
  %1 = muli %0, %c4 : index
  %2 = memref.load %arg0[%1] : memref<?xi32>
  %3 = muli %2, %2 : i32
  memref.store %3, %arg0[%1] : memref<?xi32>
}

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Anthony created this revision.Jun 15 2021, 4:00 AM

Herald added subscribers: dcaballe, cota, teijeong and 17 others. · View Herald TranscriptJun 15 2021, 4:00 AM

Anthony requested review of this revision.Jun 15 2021, 4:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2021, 4:00 AM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Anthony edited the summary of this revision. (Show Details)Jun 15 2021, 4:06 AM

Anthony added a reviewer: mehdi_amini.

Harbormaster completed remote builds in B109260: Diff 352091.Jun 15 2021, 5:08 AM

ftynse added a subscriber: ftynse.Jun 15 2021, 5:16 AM

ftynse added inline comments.

mlir/include/mlir/Dialect/SCF/Passes.td
49	This can be a function pass. Function passes can run in parallel on different functions.
50	This shouldn't be empty.
mlir/lib/Dialect/SCF/Transforms/LoopRangeFolding.cpp
1	Please add the license header.
20	We mark functions local to a translation module as `static`.
48	There is a version of `OpBuilder::clone` that takes a BlockAndValueMapping. It should allow you to create a copy of `use` operation without casting it to the specific type, and without manually calling `lookupAndDefault`.
mlir/test/Dialect/SCF/loop-range.mlir
84	Please add the newline

ftynse added a reviewer: ftynse.Jun 15 2021, 5:16 AM

Please use clang-format on the patch. There seems to be a lot of linter errors.

mlir/include/mlir/Dialect/SCF/Passes.h
38	Missing comment about the pass description.
mlir/lib/Dialect/SCF/Transforms/LoopRangeFolding.cpp
35	Simpler to just use `indVar.getUser()`. Also naming Nit. `use` typically refers to the `OpOperand &` in the operation using the value. The `Operation` using the value is typically called `user`.
87	I think if you want to do a fixed point iteration it might be better to just redo this as a pattern on `scf.for` operations. The pattern can fail if the induction variable has more than one use, or if it is used in anything other than an `add` or a `mul`. If that is not the case you can, Create a new `scf.for` operation with different `lb`, `ub` Use `inlineRegionBefore` to move the body of the original `scf.for` as a region of the new op. Replace uses of the original `add`/`mul` with the induction variable.
mlir/test/Dialect/SCF/loop-range.mlir
18	Convention on checks is to use `%[[ARG0:.]]` instead of `[[ARG0:%.]]`. It makes some of the other checks easier to write. For example, when you want to match `[%v]` . With `%[[ARG0:.]]`, `ARG0` is set to `v`. So you can just do `[%[[ARG0]]]`. OTOH when using `[[ARG0:%.]]` you will need to match `[[[ARG0]]]`. The `[[[` `]]]` conflicts with the check parsing.

This revision now requires changes to proceed.Jun 15 2021, 10:32 AM

Requested changes, minus the rewrite pattern.

Updating D104289: First crack at foor loop range folding pass.

Harbormaster completed remote builds in B109505: Diff 352419.Jun 16 2021, 6:32 AM

Is there any reason to make this separate pass instead of ForOp canonicalization? Also, diff looks incomplete

I have made the changes requested, with a few inline comments to ask for further clarification.

(If I was supposed to submit as a planed revision or something else instead, my apologies)

mlir/lib/Dialect/SCF/Transforms/LoopRangeFolding.cpp
35	I didn't see a `getUser` for `Value`, only `getUsers`, am I missing something?
48	Thanks for the tip. I used your suggested `OpBuilder::clone` method to simplify a bit. However, I still need to distinguish between AddIOp and MulIOp because the former does not fold into the for loop step, while the latter does.
87	Thanks for the comments. So I can rewrite this into a rewrite pattern, but I have one question: Does a rewrite pattern on `ForOp` guarantee that the innermost loop will get rewritten first? I believe that is an invariant of the transform as well. I can come up with a test case that better demonstrates the snag I hit without the inner-most transform.

Anthony marked 5 inline comments as done.Jun 18 2021, 9:53 AM

It looks like the diff was updated the wrong way: now it only contains the difference between the initial version and the updated version, rather than the difference between the updated version and the mainline. If you created a separate git commit for the changes, you need to do arc diff HEAD^^ (as many ^ as commits in your branch) or arc diff <hash-of-the-commit-on-the-main-branch>.

Sorry about that, I believe I have it correct now.

ftynse accepted this revision.Jun 21 2021, 4:25 AM

ftynse added inline comments.

mlir/test/Dialect/SCF/loop-range.mlir
1	We prefer to only test one pass unless strictly necessary otherwise. Helps debugging.

Harbormaster completed remote builds in B110173: Diff 353335.Jun 21 2021, 4:58 AM

In D104289#2827120, @Hardcode84 wrote:

Is there any reason to make this separate pass instead of ForOp canonicalization?

Was this answered?

Also please update the title to be descriptive, and update the description to reflect what the commit message will be.

In D104289#2831392, @mehdi_amini wrote:

In D104289#2827120, @Hardcode84 wrote:

Is there any reason to make this separate pass instead of ForOp canonicalization?

Was this answered?

Also please update the title to be descriptive, and update the description to reflect what the commit message will be.

Sorry, I missed this. I followed the pattern for the loop invariant code motion pass, which is not a canonicalization on for. For this choice, I actually am not sure whether this would be better off as a canonicalization, or as its own pass. Can I have some guidance here?

(Will update description and commit messages).

Thinking about canonicalization, in the past I've been working with a canonical form for loops that would try to form the iteration space so that it starts at 0 and has a step increment of one, which may be the opposite of this transformation.

So having this as a separate pass makes sense, but I'd be cautious and keep in mind that this may just be as a lowering step which could be undone by canonicalization.

mlir/lib/Dialect/SCF/Transforms/LoopRangeFolding.cpp
46	https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/IR/Value.h#L209-L214
62–63	(keep `auto` for when it improves readability and/or it is obvious from the context, same below in this patch)

Can you fix the foor -> for typo in the revision title? Also "scf for" in the title will be useful.

Implement an scf.for range folding optimization pass.

In cases where arithmetic (addi/muli) ops are performed on an scf.for
loops induction variable with a single use, we can fold those ops
directly into the scf.for loop.

For example, in the following code:

scf.for %i = %c0 to %arg1 step %c1 {
  %0 = addi %arg2, %i : index
  %1 = muli %0, %c4 : index
  %2 = memref.load %arg0[%1] : memref<?xi32>
  %3 = muli %2, %2 : i32
  memref.store %3, %arg0[%1] : memref<?xi32>
}

we can lift %0 up into the scf.for loop range, as it is the only user
of %i:

%lb = addi %arg2, %c0 : index
%ub = addi %arg2, %i : index
scf.for %i = %lb to %ub step %c1 {
  %1 = muli %0, %c4 : index
  %2 = memref.load %arg0[%1] : memref<?xi32>
  %3 = muli %2, %2 : i32
  memref.store %3, %arg0[%1] : memref<?xi32>
}

In D104289#2831536, @mehdi_amini wrote:

Thinking about canonicalization, in the past I've been working with a canonical form for loops that would try to form the iteration space so that it starts at 0 and has a step increment of one, which may be the opposite of this transformation.

So having this as a separate pass makes sense, but I'd be cautious and keep in mind that this may just be as a lowering step which could be undone by canonicalization.

Thanks, got it. I'm happy to change the pass in the future if it turns we need it as a canonicalization.

I've also made the requested changes, and added another test case (see description in loop-range.mlir).

I believe there is still the question of whether or not this is better off as a rewrite pattern per @mravishankar comments.

Harbormaster completed remote builds in B110368: Diff 353590.Jun 22 2021, 4:13 AM

Anthony retitled this revision from First crack at foor loop range folding pass. to Implement an scf.for range folding optimization pass..Jun 22 2021, 5:48 AM

LGTM as-is, with a few comments.

mlir/include/mlir/Dialect/SCF/Passes.td
49	Can you actually make it an Operation pass: there is no need to anchor this to a function I believe.
mlir/lib/Dialect/SCF/Transforms/LoopRangeFolding.cpp
85	(After you make it an Operation pass, this will be just `getOperation()->walk(`)
86	`foldRanges` can't fail, can you make it return void? At this point I would inline the body here as well.

Oh and if you can update the description here to match what will be the commit description please.

In D104289#2834278, @mehdi_amini wrote:

LGTM as-is, with a few comments.

I can change it to operation pass---originally I had it as one, but changed it to a function pass at the request of another reviewer @ftynse so that it can be run in parallel on functions. Do we still want to go back to operation pass?

(I'll update description as well).

In D104289#2834495, @Anthony wrote:

In D104289#2834278, @mehdi_amini wrote:

LGTM as-is, with a few comments.

I can change it to operation pass---originally I had it as one, but changed it to a function pass at the request of another reviewer @ftynse so that it can be run in parallel on functions. Do we still want to go back to operation pass?

(I'll update description as well).

An OperationPass is strictly better, because it can run on different operation types (think different kinds of FuncOp). Being an OperationPass doesn't change how the parallelization works, because the pass should still be scheduled per FuncOp. The only potential reason why parallelization would be hurt, is if a user accidentally schedules it on a ModuleOp (which is allowed for generic OperationPass').

To extend on what River describes with an example:

-pass-pipeline='func(for-loop-range-folding)' -> runs in parallel on function
`-pass-pipeline='module(for-loop-range-folding)' -> won't run in parallel.

The point is that making it an operation pass also allows to do:

-pass-pipeline='gpu.func(for-loop-range-folding)' -> run in parallel on GPU functions!

Implement an scf.for range folding optimization pass.

In cases where arithmetic (addi/muli) ops are performed on an scf.for
loops induction variable with a single use, we can fold those ops
directly into the scf.for loop.

For example, in the following code:

scf.for %i = %c0 to %arg1 step %c1 {
  %0 = addi %arg2, %i : index
  %1 = muli %0, %c4 : index
  %2 = memref.load %arg0[%1] : memref<?xi32>
  %3 = muli %2, %2 : i32
  memref.store %3, %arg0[%1] : memref<?xi32>
}

we can lift %0 up into the scf.for loop range, as it is the only user
of %i:

%lb = addi %arg2, %c0 : index
%ub = addi %arg2, %i : index
scf.for %i = %lb to %ub step %c1 {
  %1 = muli %0, %c4 : index
  %2 = memref.load %arg0[%1] : memref<?xi32>
  %3 = muli %2, %2 : i32
  memref.store %3, %arg0[%1] : memref<?xi32>
}

Anthony edited the summary of this revision. (Show Details)Jun 23 2021, 2:09 AM

Anthony edited the summary of this revision. (Show Details)

Thanks for the clarifications guys. I've gone ahead and switched it to an operation pass, and inlined the foldRanges function into the operation walk.

Harbormaster completed remote builds in B110578: Diff 353894.Jun 23 2021, 3:20 AM

In D104289#2835402, @Anthony wrote:

Thanks for the clarifications guys.

As a nit you likely intend to use "guys" in a neutral way, but nowadays it is quite controversial. It may be better to avoid it.

I've gone ahead and switched it to an operation pass, and inlined the foldRanges function into the operation walk.

Thanks! LG, do you need someone to land this for you?

In D104289#2837490, @mehdi_amini wrote:

In D104289#2835402, @Anthony wrote:

Thanks for the clarifications guys.

As a nit you likely intend to use "guys" in a neutral way, but nowadays it is quite controversial. It may be better to avoid it.

Thanks for pointing that out---I'll be more conscientious about my wording in the future.

I've gone ahead and switched it to an operation pass, and inlined the foldRanges function into the operation walk.

Thanks! LG, do you need someone to land this for you?

Yes please! This is my first commit.

This revision was not accepted when it landed; it landed in state Needs Review.Jun 23 2021, 6:07 PM

Closed by commit rG3f429e82d3ea: Implement an scf.for range folding optimization pass. (authored by Anthony, committed by mehdi_amini). · Explain Why

This revision was automatically updated to reflect the committed changes.

mehdi_amini added a commit: rG3f429e82d3ea: Implement an scf.for range folding optimization pass..

I think we need to revisit this patch as it is broken for some very common cases. Left notes here: https://github.com/llvm/llvm-project/issues/56235

Herald added a project: Restricted Project. · View Herald TranscriptJun 26 2022, 3:38 PM

Herald added subscribers: bzcheeseman, sdasgup3, wenzhicui, wrengr. · View Herald Transcript

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

SCF/

Passes.h

3 lines

Passes.td

4 lines

lib/

Dialect/

SCF/

Transforms/

LoopRangeFolding.cpp

92 lines

test/

Dialect/

SCF/

loop-range.mlir

56 lines

Diff 352419

mlir/include/mlir/Dialect/SCF/Passes.h

	Show All 29 Lines
	/// Creates a pass that specializes parallel loop for unrolling and			/// Creates a pass that specializes parallel loop for unrolling and
	/// vectorization.			/// vectorization.
	std::unique_ptr<Pass> createParallelLoopSpecializationPass();			std::unique_ptr<Pass> createParallelLoopSpecializationPass();

	/// Creates a pass which tiles innermost parallel loops.			/// Creates a pass which tiles innermost parallel loops.
	std::unique_ptr<Pass>			std::unique_ptr<Pass>
	createParallelLoopTilingPass(llvm::ArrayRef<int64_t> tileSize = {});			createParallelLoopTilingPass(llvm::ArrayRef<int64_t> tileSize = {});

				/// Creates a pass which folds arith ops on induction variable into
				mravishankarUnsubmitted Done Reply Inline Actions Missing comment about the pass description. mravishankar: Missing comment about the pass description.
				/// loop range.
	std::unique_ptr<Pass> createForLoopRangeFoldingPass();			std::unique_ptr<Pass> createForLoopRangeFoldingPass();


	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Registration			// Registration
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	/// Generate the code for registering passes.			/// Generate the code for registering passes.
	#define GEN_PASS_REGISTRATION			#define GEN_PASS_REGISTRATION
	#include "mlir/Dialect/SCF/Passes.h.inc"			#include "mlir/Dialect/SCF/Passes.h.inc"

	} // namespace mlir			} // namespace mlir

	#endif // MLIR_DIALECT_SCF_PASSES_H_			#endif // MLIR_DIALECT_SCF_PASSES_H_

mlir/include/mlir/Dialect/SCF/Passes.td

Show All 40 Lines	let options = [
ListOption<"tileSizes", "parallel-loop-tile-sizes", "int64_t",		ListOption<"tileSizes", "parallel-loop-tile-sizes", "int64_t",
"Factors to tile parallel loops by",		"Factors to tile parallel loops by",
"llvm::cl::ZeroOrMore, llvm::cl::MiscFlags::CommaSeparated">		"llvm::cl::ZeroOrMore, llvm::cl::MiscFlags::CommaSeparated">
];		];
let dependentDialects = ["AffineDialect"];		let dependentDialects = ["AffineDialect"];
}		}

def SCFForLoopRangeFolding		def SCFForLoopRangeFolding
: Pass<"for-loop-range-folding"> {		: FunctionPass<"for-loop-range-folding"> {
		ftynseUnsubmitted Done Reply Inline Actions This can be a function pass. Function passes can run in parallel on different functions. ftynse: This can be a function pass. Function passes can run in parallel on different functions.
		mehdi_aminiUnsubmitted Done Reply Inline Actions Can you actually make it an Operation pass: there is no need to anchor this to a function I believe. mehdi_amini: Can you actually make it an Operation pass: there is no need to anchor this to a function I…
let summary = "";		let summary = "Fold add/mul ops into loop range";
		ftynseUnsubmitted Done Reply Inline Actions This shouldn't be empty. ftynse: This shouldn't be empty.
let constructor = "mlir::createForLoopRangeFoldingPass()";		let constructor = "mlir::createForLoopRangeFoldingPass()";
}		}

#endif // MLIR_DIALECT_SCF_PASSES		#endif // MLIR_DIALECT_SCF_PASSES

mlir/lib/Dialect/SCF/Transforms/LoopRangeFolding.cpp

//===- LoopRangeFolding.cpp - Code to perform loop range folding-----------===//

ftynseUnsubmitted

Done

Please add the license header.

ftynse: Please add the license header.

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

// This file implements loop range folding.

//===----------------------------------------------------------------------===//

#include "PassDetail.h" #include "PassDetail.h"

#include "mlir/Dialect/SCF/Passes.h" #include "mlir/Dialect/SCF/Passes.h"

#include "mlir/Dialect/SCF/SCF.h" #include "mlir/Dialect/SCF/SCF.h"

#include "mlir/Dialect/SCF/Transforms.h" #include "mlir/Dialect/SCF/Transforms.h"

#include "mlir/Dialect/SCF/Utils.h" #include "mlir/Dialect/SCF/Utils.h"

#include "mlir/Dialect/StandardOps/IR/Ops.h" #include "mlir/Dialect/StandardOps/IR/Ops.h"

#include "mlir/IR/BlockAndValueMapping.h" #include "mlir/IR/BlockAndValueMapping.h"

ftynseUnsubmitted

Done

We mark functions local to a translation module as static.

ftynse: We mark functions local to a translation module as `static`.

using namespace mlir; using namespace mlir;

using namespace mlir::scf; using namespace mlir::scf;

namespace { namespace {

struct ForLoopRangeFolding struct ForLoopRangeFolding

: public SCFForLoopRangeFoldingBase<ForLoopRangeFolding> { : public SCFForLoopRangeFoldingBase<ForLoopRangeFolding> {

void runOnOperation() override; void runOnFunction() override;

}; };

} } // namespace

LogicalResult foldRanges(ForOp op) { static LogicalResult foldRanges(ForOp op) {

// Fold until a fixed point is reached // Fold until a fixed point is reached

Value indVar = op.getInductionVar(); Value indVar = op.getInductionVar();

auto canBeFolded = [&](Value value) { auto canBeFolded = [&](Value value) {

mravishankarUnsubmitted

Done

Simpler to just use indVar.getUser().

Also naming Nit. use typically refers to the OpOperand & in the operation using the value. The Operation using the value is typically called user.

mravishankar: Simpler to just use `indVar.getUser()`. Also naming Nit. `use` typically refers to the…

AnthonyAuthorUnsubmitted

Done

I didn't see a getUser for Value, only getUsers, am I missing something?

Anthony: I didn't see a `getUser` for `Value`, only `getUsers`, am I missing something?

return op.isDefinedOutsideOfLoop(value) || value == indVar; return op.isDefinedOutsideOfLoop(value) || value == indVar;

}; };

while (true) { while (true) {

// If the induction variable is used more than once, we can't fold its arith // If the induction variable is used more than once, we can't fold its arith

// ops into the loop range // ops into the loop range

if (!indVar.hasOneUse()) if (!indVar.hasOneUse())

break; break;

Operation *use = indVar.getUses().begin().getUser(); Operation *user = indVar.getUses().begin().getUser();

mehdi_aminiUnsubmitted

Done

break;

- Operation *user = indVar.getUses().begin().getUser();

+ Operation *user = &indVar.getUsers().front();

if (!isa<AddIOp, MulIOp>(user))

https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/IR/Value.h#L209-L214

mehdi_amini: https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/IR/Value.h#L209-L214

if (!isa<AddIOp, MulIOp>(use)) if (!isa<AddIOp, MulIOp>(user))

break; break;

ftynseUnsubmitted

Done

There is a version of OpBuilder::clone that takes a BlockAndValueMapping. It should allow you to create a copy of use operation without casting it to the specific type, and without manually calling lookupAndDefault.

ftynse: There is a version of `OpBuilder::clone` that takes a BlockAndValueMapping. It should allow you…

AnthonyAuthorUnsubmitted

Done

Thanks for the tip. I used your suggested OpBuilder::clone method to simplify a bit. However, I still need to distinguish between AddIOp and MulIOp because the former does not fold into the for loop step, while the latter does.

Anthony: Thanks for the tip. I used your suggested `OpBuilder::clone` method to simplify a bit. However…

if (!llvm::all_of(use->getOperands(), canBeFolded)) if (!llvm::all_of(user->getOperands(), canBeFolded))

break; break;

OpBuilder b(op); OpBuilder b(op);

BlockAndValueMapping lbMap; lbMap.map(indVar, op.lowerBound()); BlockAndValueMapping lbMap;

BlockAndValueMapping ubMap; ubMap.map(indVar, op.upperBound()); lbMap.map(indVar, op.lowerBound());

BlockAndValueMapping stepMap; stepMap.map(indVar, op.step()); BlockAndValueMapping ubMap;

ubMap.map(indVar, op.upperBound());

if (auto addOp = dyn_cast<AddIOp>(use)) { BlockAndValueMapping stepMap;

auto lbFold = b.create<AddIOp>( stepMap.map(indVar, op.step());

op.getLoc(),

lbMap.lookupOrDefault(addOp.getOperand(0)), if (isa<AddIOp>(user)) {

lbMap.lookupOrDefault(addOp.getOperand(1))); auto lbFold = b.clone(*user, lbMap);

auto ubFold = b.clone(*user, ubMap);

mehdi_aminiUnsubmitted

Done

if (isa<AddIOp>(user)) {

- auto lbFold = b.clone(*user, lbMap);

- auto ubFold = b.clone(*user, ubMap);

+ Operation *lbFold = b.clone(*user, lbMap);

+ Operation *ubFold = b.clone(*user, ubMap);

op.setLowerBound(lbFold->getResult(0));

(keep auto for when it improves readability and/or it is obvious from the context, same below in this patch)

mehdi_amini: (keep `auto` for when it improves readability and/or it is obvious from the context, same below…

auto ubFold = b.create<AddIOp>(

op.getLoc(), op.setLowerBound(lbFold->getResult(0));

ubMap.lookupOrDefault(addOp.getOperand(0)), op.setUpperBound(ubFold->getResult(0));

ubMap.lookupOrDefault(addOp.getOperand(1)));

} else if (isa<MulIOp>(user)) {

op.setLowerBound(lbFold); auto ubFold = b.clone(*user, ubMap);

op.setUpperBound(ubFold); auto stepFold = b.clone(*user, stepMap);

addOp.replaceAllUsesWith(indVar); op.setUpperBound(ubFold->getResult(0));

addOp.erase(); op.setStep(stepFold->getResult(0));

}

} else if (auto mulOp = dyn_cast<MulIOp>(use)) {

auto ubFold = b.create<MulIOp>( ValueRange wrapIndvar(indVar);

op.getLoc(), user->replaceAllUsesWith(wrapIndvar);

ubMap.lookupOrDefault(mulOp.getOperand(0)), user->erase();

ubMap.lookupOrDefault(mulOp.getOperand(1)));

auto stepFold = b.create<MulIOp>(

op.getLoc(),

stepMap.lookupOrDefault(mulOp.getOperand(0)),

stepMap.lookupOrDefault(mulOp.getOperand(1)));

op.setUpperBound(ubFold);

op.setStep(stepFold);

mulOp.replaceAllUsesWith(indVar);

mulOp.erase();

}

} }

return success(); return success();

} }

void ForLoopRangeFolding::runOnOperation() { void ForLoopRangeFolding::runOnFunction() {

getOperation()->walk([&](ForOp forOp) { getFunction().getOperation()->walk([&](ForOp forOp) {

mehdi_aminiUnsubmitted

Done

(After you make it an Operation pass, this will be just getOperation()->walk()

mehdi_amini: (After you make it an Operation pass, this will be just `getOperation()->walk(`)

if (failed(foldRanges(forOp))) if (failed(foldRanges(forOp)))

mehdi_aminiUnsubmitted

Done

foldRanges can't fail, can you make it return void?
At this point I would inline the body here as well.

mehdi_amini: `foldRanges` can't fail, can you make it return void? At this point I would inline the body…

signalPassFailure(); signalPassFailure();

mravishankarUnsubmitted

Done

I think if you want to do a fixed point iteration it might be better to just redo this as a pattern on scf.for operations.

The pattern can fail if the induction variable has more than one use, or if it is used in anything other than an add or a mul. If that is not the case you can,

Create a new scf.for operation with different lb, ub
Use inlineRegionBefore to move the body of the original scf.for as a region of the new op.
Replace uses of the original add/mul with the induction variable.

mravishankar: I think if you want to do a fixed point iteration it might be better to just redo this as a…

AnthonyAuthorUnsubmitted

Done

Thanks for the comments. So I can rewrite this into a rewrite pattern, but I have one question: Does a rewrite pattern on ForOp guarantee that the innermost loop will get rewritten first? I believe that is an invariant of the transform as well.

I can come up with a test case that better demonstrates the snag I hit without the inner-most transform.

Anthony: Thanks for the comments. So I can rewrite this into a rewrite pattern, but I have one question…

}); });

} }

std::unique_ptr<Pass> mlir::createForLoopRangeFoldingPass() { std::unique_ptr<Pass> mlir::createForLoopRangeFoldingPass() {

return std::make_unique<ForLoopRangeFolding>(); return std::make_unique<ForLoopRangeFolding>();

} }

mlir/test/Dialect/SCF/loop-range.mlir

	// RUN: mlir-opt %s -pass-pipeline='func(for-loop-range-folding,canonicalize)' -split-input-file \| FileCheck %s			// RUN: mlir-opt %s -pass-pipeline='func(for-loop-range-folding,canonicalize)' -split-input-file \| FileCheck %s
				ftynseUnsubmitted Done Reply Inline Actions We prefer to only test one pass unless strictly necessary otherwise. Helps debugging. ftynse: We prefer to only test one pass unless strictly necessary otherwise. Helps debugging.

	func @fold_one_loop(%arg0: memref<?xi32>, %arg1: index, %arg2: index) {			func @fold_one_loop(%arg0: memref<?xi32>, %arg1: index, %arg2: index) {
	%c0 = constant 0 : index			%c0 = constant 0 : index
	%c1 = constant 1 : index			%c1 = constant 1 : index
	%c4 = constant 4 : index			%c4 = constant 4 : index
	scf.for %i = %c0 to %arg1 step %c1 {			scf.for %i = %c0 to %arg1 step %c1 {
	%0 = addi %arg2, %i : index			%0 = addi %arg2, %i : index
	%1 = muli %0, %c4 : index			%1 = muli %0, %c4 : index
	%2 = memref.load %arg0[%1] : memref<?xi32>			%2 = memref.load %arg0[%1] : memref<?xi32>
	%3 = muli %2, %2 : i32			%3 = muli %2, %2 : i32
	memref.store %3, %arg0[%1] : memref<?xi32>			memref.store %3, %arg0[%1] : memref<?xi32>
	}			}
	return			return
	}			}

	// CHECK-LABEL: func @fold_one_loop			// CHECK-LABEL: func @fold_one_loop
	// CHECK-SAME: ([[ARG0:%.]]: {{.}}, [[ARG1:%.]]: {{.}}, [[ARG2:%.]]: {{.}}			// CHECK-SAME: (%[[ARG0:.]]: {{.}}, %[[ARG1:.]]: {{.}}, %[[ARG2:.]]: {{.}}
				mravishankarUnsubmitted Done Reply Inline Actions Convention on checks is to use `%[[ARG0:.]]` instead of `[[ARG0:%.]]`. It makes some of the other checks easier to write. For example, when you want to match `[%v]` . With `%[[ARG0:.]]`, `ARG0` is set to `v`. So you can just do `[%[[ARG0]]]`. OTOH when using `[[ARG0:%.]]` you will need to match `[[[ARG0]]]`. The `[[[` `]]]` conflicts with the check parsing. mravishankar: Convention on checks is to use `%[[ARG0:.]]` instead of `[[ARG0:%.]]`. It makes some of the…
	// CHECK: [[C4:%.*]] = constant 4 : index			// CHECK: %[[C4:.*]] = constant 4 : index
	// CHECK: [[I0:%.*]] = addi [[ARG2]], [[ARG1]] : index			// CHECK: %[[I0:.*]] = addi %[[ARG2]], %[[ARG1]] : index
	// CHECK: [[I1:%.*]] = muli [[I0]], [[C4]] : index			// CHECK: %[[I1:.*]] = muli %[[I0]], %[[C4]] : index
	// CHECK: scf.for [[I:%.*]] = [[ARG2]] to [[I1]] step [[C4]] {			// CHECK: scf.for %[[I:.*]] = %[[ARG2]] to %[[I1]] step %[[C4]] {
	// CHECK: [[I2:%.*]] = memref.load [[ARG0]]{{\[}}[[I]]			// CHECK: %[[I2:.*]] = memref.load %[[ARG0]]{{\[}}%[[I]]
	// CHECK: [[I3:%.*]] = muli [[I2]], [[I2]] : i32			// CHECK: %[[I3:.*]] = muli %[[I2]], %[[I2]] : i32
	// CHECK: memref.store [[I3]], [[ARG0]]{{\[}}[[I]]			// CHECK: memref.store %[[I3]], %[[ARG0]]{{\[}}%[[I]]

	func @fold_one_loop2(%arg0: memref<?xi32>, %arg1: index, %arg2: index) {			func @fold_one_loop2(%arg0: memref<?xi32>, %arg1: index, %arg2: index) {
	%c0 = constant 0 : index			%c0 = constant 0 : index
	%c1 = constant 1 : index			%c1 = constant 1 : index
	%c4 = constant 4 : index			%c4 = constant 4 : index
	%c10 = constant 10 : index			%c10 = constant 10 : index
	scf.for %j = %c0 to %c10 step %c1 {			scf.for %j = %c0 to %c10 step %c1 {
	scf.for %i = %c0 to %arg1 step %c1 {			scf.for %i = %c0 to %arg1 step %c1 {
	%0 = addi %arg2, %i : index			%0 = addi %arg2, %i : index
	%1 = muli %0, %c4 : index			%1 = muli %0, %c4 : index
	%2 = memref.load %arg0[%1] : memref<?xi32>			%2 = memref.load %arg0[%1] : memref<?xi32>
	%3 = muli %2, %2 : i32			%3 = muli %2, %2 : i32
	memref.store %3, %arg0[%1] : memref<?xi32>			memref.store %3, %arg0[%1] : memref<?xi32>
	}			}
	}			}
	return			return
	}			}

	// CHECK-LABEL: func @fold_one_loop2			// CHECK-LABEL: func @fold_one_loop2
	// CHECK-SAME: ([[ARG0:%.]]: {{.}}, [[ARG1:%.]]: {{.}}, [[ARG2:%.]]: {{.}}			// CHECK-SAME: (%[[ARG0:.]]: {{.}}, %[[ARG1:.]]: {{.}}, %[[ARG2:.]]: {{.}}
	// CHECK: [[C4:%.*]] = constant 4 : index			// CHECK: %[[C4:.*]] = constant 4 : index
	// CHECK: [[I0:%.*]] = addi [[ARG2]], [[ARG1]] : index			// CHECK: %[[I0:.*]] = addi %[[ARG2]], %[[ARG1]] : index
	// CHECK: [[I1:%.*]] = muli [[I0]], [[C4]] : index			// CHECK: %[[I1:.*]] = muli %[[I0]], %[[C4]] : index
	// CHECK: scf.for [[I:%.*]] = [[ARG2]] to [[I1]] step [[C4]] {			// CHECK: scf.for %[[I:.*]] = %[[ARG2]] to %[[I1]] step %[[C4]] {
	// CHECK: [[I2:%.*]] = memref.load [[ARG0]]{{\[}}[[I]]			// CHECK: %[[I2:.*]] = memref.load %[[ARG0]]{{\[}}%[[I]]
	// CHECK: [[I3:%.*]] = muli [[I2]], [[I2]] : i32			// CHECK: %[[I3:.*]] = muli %[[I2]], %[[I2]] : i32
	// CHECK: memref.store [[I3]], [[ARG0]]{{\[}}[[I]]			// CHECK: memref.store %[[I3]], %[[ARG0]]{{\[}}%[[I]]

	func @fold_two_loops(%arg0: memref<?xi32>, %arg1: index, %arg2: index) {			func @fold_two_loops(%arg0: memref<?xi32>, %arg1: index, %arg2: index) {
	%c0 = constant 0 : index			%c0 = constant 0 : index
	%c1 = constant 1 : index			%c1 = constant 1 : index
	%c4 = constant 4 : index			%c4 = constant 4 : index
	%c10 = constant 10 : index			%c10 = constant 10 : index
	scf.for %j = %c0 to %c10 step %c1 {			scf.for %j = %c0 to %c10 step %c1 {
	scf.for %i = %j to %arg1 step %c1 {			scf.for %i = %j to %arg1 step %c1 {
	%0 = addi %arg2, %i : index			%0 = addi %arg2, %i : index
	%1 = muli %0, %c4 : index			%1 = muli %0, %c4 : index
	%2 = memref.load %arg0[%1] : memref<?xi32>			%2 = memref.load %arg0[%1] : memref<?xi32>
	%3 = muli %2, %2 : i32			%3 = muli %2, %2 : i32
	memref.store %3, %arg0[%1] : memref<?xi32>			memref.store %3, %arg0[%1] : memref<?xi32>
	}			}
	}			}
	return			return
	}			}

	// CHECK-LABEL: func @fold_two_loops			// CHECK-LABEL: func @fold_two_loops
	// CHECK-SAME: ([[ARG0:%.]]: {{.}}, [[ARG1:%.]]: {{.}}, [[ARG2:%.]]: {{.}}			// CHECK-SAME: (%[[ARG0:.]]: {{.}}, %[[ARG1:.]]: {{.}}, %[[ARG2:.]]: {{.}}
	// CHECK: [[C10:%.*]] = constant 10 : index			// CHECK: %[[C10:.*]] = constant 10 : index
	// CHECK: [[C4:%.*]] = constant 4 : index			// CHECK: %[[C4:.*]] = constant 4 : index
	// CHECK: [[C1:%.*]] = constant 1 : index			// CHECK: %[[C1:.*]] = constant 1 : index
	// CHECK: [[I0:%.*]] = addi [[ARG2]], [[C10]] : index			// CHECK: %[[I0:.*]] = addi %[[ARG2]], %[[C10]] : index
	// CHECK: scf.for [[J:%.*]] = [[ARG2]] to [[I0]] step [[C1]] {			// CHECK: scf.for %[[J:.*]] = %[[ARG2]] to %[[I0]] step %[[C1]] {
	// CHECK: [[I1:%.*]] = addi [[ARG2]], [[ARG1]] : index			// CHECK: %[[I1:.*]] = addi %[[ARG2]], %[[ARG1]] : index
	// CHECK: [[I2:%.*]] = muli [[I1]], [[C4]] : index			// CHECK: %[[I2:.*]] = muli %[[I1]], %[[C4]] : index
	// CHECK: scf.for [[I:%.*]] = [[J]] to [[I2]] step [[C4]] {			// CHECK: scf.for %[[I:.*]] = %[[J]] to %[[I2]] step %[[C4]] {
	// CHECK: [[I3:%.*]] = memref.load [[ARG0]]{{\[}}[[I]]			// CHECK: %[[I3:.*]] = memref.load %[[ARG0]]{{\[}}%[[I]]
	// CHECK: [[I4:%.*]] = muli [[I3]], [[I3]] : i32			// CHECK: %[[I4:.*]] = muli %[[I3]], %[[I3]] : i32
	// CHECK: memref.store [[I4]], [[ARG0]]{{\[}}[[I]]			// CHECK: memref.store %[[I4]], %[[ARG0]]{{\[}}%[[I]]
				ftynseUnsubmitted Done Reply Inline Actions Please add the newline ftynse: Please add the newline
	No newline at end of file

This is an archive of the discontinued LLVM Phabricator instance.

Implement an scf.for range folding optimization pass.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 352419

mlir/include/mlir/Dialect/SCF/Passes.h

mlir/include/mlir/Dialect/SCF/Passes.td

mlir/lib/Dialect/SCF/Transforms/LoopRangeFolding.cpp

mlir/test/Dialect/SCF/loop-range.mlir

Implement an scf.for range folding optimization pass.
ClosedPublic