Download Raw Diff

Details

Reviewers

herhut
nicolasvasilache
ftynse

Commits

rGbc1947a6f51f: Add a basic tiling pass for parallel loops

Summary

This exploits the fact that the iterations of parallel loops are
independent so tiling becomes just an index transformation. This pass
only tiles the innermost loop of a loop nest.

The ultimate goal is to allow vectorization of the tiled loops, but I
don't think we're there yet with the current rewriting, as the tiled
loops don't have a constant trip count.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

bkramer created this revision.Feb 21 2020, 4:31 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptFeb 21 2020, 4:31 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, Joonsoo, liufengdb and 12 others. · View Herald Transcript

use pass option instead of global option
fix the recursive walker to walk recursively

ftynse added a reviewer: ftynse.Feb 21 2020, 5:00 AM

herhut requested changes to this revision.Feb 21 2020, 5:01 AM

herhut added inline comments.

mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopTiling.cpp
25	Please use pass options.
30	nit: Doc comments need to start with `///` Use parallel loop instead of ploop.
42	tileParallelLoop
44	Maybe `newSteps.reserve`?
68	I do not understand this part. We want the min of either the step size or dim - upperbound_of_outer.
77	you can do something like innerLoop.region().takeBody(op.region()) which simply transfers one region to the other. As the number of block arguments does not change, no need to fix uses or anything.
95	`getBlock().getOps<loop::ParallelOp>()` gives an (potentially empty) iterator over all parallel loops in the block.

This revision now requires changes to proceed.Feb 21 2020, 5:01 AM

Harbormaster completed remote builds in B46985: Diff 245816.Feb 21 2020, 5:02 AM

We already several implementations of loop tiling (lib/Transforms/LoopTiling.cpp, lib/Dialect/Linalg/Transforms/Tiling.cpp, lib/Transforms/Utils/LoopUtils.cpp). Have you considered generalizing them instead of introducing yet another one? I suppose parallel loop nest as an operation removes the need for any supplementary preconditions, but the mechanics of the transformation should be very similar between parallel and sequential loops.

I agree we may want to peel off the last iterations to avoid complex conditions, but IMO it's better as an option to the transformation. @poechsel had a similar requested with loops produced by Linalg tiling, so it would make sense to try and reuse the infra here.

ftynse requested changes to this revision.Feb 21 2020, 5:07 AM

ftynse added inline comments.

mlir/test/Dialect/Loops/parallel-loop-tiling.mlir
20	Please don't pattern-match SSA names, and generally prefer avoiding non-essential parts of the IR from the test (e.g. repeated types in the pattern). https://mlir.llvm.org/getting_started/TestingGuide/

Harbormaster completed remote builds in B46987: Diff 245822.Feb 21 2020, 5:20 AM

One round of addressed review comments

In D74954#1886199, @ftynse wrote:

We already several implementations of loop tiling (lib/Transforms/LoopTiling.cpp, lib/Dialect/Linalg/Transforms/Tiling.cpp, lib/Transforms/Utils/LoopUtils.cpp). Have you considered generalizing them instead of introducing yet another one? I suppose parallel loop nest as an operation removes the need for any supplementary preconditions, but the mechanics of the transformation should be very similar between parallel and sequential loops.

The complexity of the tiling passes is in the dependency analysis and not the rewriting itself. This one is purely structural. We would need to split the actual tiling rewrite form the analysis with interfaces for generating the tiled operation. I am not sure that is worth the code we would actually get to reuse.

mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopTiling.cpp
52	The replication part could be a separate pattern/pass that rewrites a loop with dynamic upper bounds specified with a constant upper bound (so an affine.min with a constant) into something like bound = affine.min(constant, other thing) if (bound ==constant) { // loop with constant upper bound } else { // loop with dynamic upper bound } Maybe this is good enough to make LLVM recognize the potential for vectorization. That rewrite could be applied independently.
96	Can we have an ArrayRef<unsigned> for tile sizes? So we can have different ones at different levels? I would suspect that we, for vectorization, always want to tile 1x...x1xN.

herhut added inline comments.Feb 21 2020, 6:19 AM

mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopTiling.cpp
68	I always get confused by this myself. I think it needs to be `upper_bound - current_index_of_outer_loop` to get the number of remaining iterations. And then the min with the tiling size.

Harbormaster completed remote builds in B46991: Diff 245832.Feb 21 2020, 6:33 AM

Make tileSizes a list
Fix min again

Harbormaster completed remote builds in B47001: Diff 245847.Feb 21 2020, 7:28 AM

Just adding some minor nit comments!

We already several implementations of loop tiling (lib/Transforms/LoopTiling.cpp, lib/Dialect/Linalg/Transforms/Tiling.cpp, lib/Transforms/Utils/LoopUtils.cpp). Have you considered generalizing them instead of introducing yet another one? I suppose parallel loop nest as an operation removes the need for any supplementary preconditions, but the mechanics of the transformation should be very similar between parallel and sequential loops.

The complexity of the tiling passes is in the dependency analysis and not the rewriting itself. This one is purely structural. We would need to split the actual tiling rewrite form the analysis with interfaces for generating the tiled operation. I am not sure that is worth the code we would actually get to reuse.

I think it might be worth it since affine parallel for would be another candidate for tiling.

mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopTiling.cpp
41	`size_t i = 0; i != op.upperBound().size();` -> `size_t i = 0, end = op.upperBound().size(); i != end;` ?
116	please drop {} for single line for and if-else statements
mlir/test/Dialect/Loops/parallel-loop-tiling.mlir
38	Please add `// -----` between tests for split-input-file to work

In D74954#1887027, @dcaballe wrote:

Just adding some minor nit comments!

We already several implementations of loop tiling (lib/Transforms/LoopTiling.cpp, lib/Dialect/Linalg/Transforms/Tiling.cpp, lib/Transforms/Utils/LoopUtils.cpp). Have you considered generalizing them instead of introducing yet another one? I suppose parallel loop nest as an operation removes the need for any supplementary preconditions, but the mechanics of the transformation should be very similar between parallel and sequential loops.

The complexity of the tiling passes is in the dependency analysis and not the rewriting itself. This one is purely structural. We would need to split the actual tiling rewrite form the analysis with interfaces for generating the tiled operation. I am not sure that is worth the code we would actually get to reuse.

I think it might be worth it since affine parallel for would be another candidate for tiling.

I guess we would need to compute the new step sizes as affine expressions, as well, in such case. My point was how much code reuse we would get.

mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopTiling.cpp
34	This comment does not state the actual rewriting, does it?

Thanks!

LGTM with comments addressed.

mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopTiling.cpp
40	mega-nit: `tileSizeConstants.reserve`

Address moar comments

Herald added a reviewer: • espindola. · View Herald TranscriptFeb 24 2020, 2:42 AM

Herald added a reviewer: alexander-shaposhnikov. · View Herald Transcript

Herald added a reviewer: rupprecht. · View Herald Transcript

Herald added a reviewer: jhenderson. · View Herald Transcript

Herald added a reviewer: jdoerfert. · View Herald Transcript

Herald added a reviewer: sstefan1. · View Herald Transcript

Herald added a reviewer: mravishankar. · View Herald Transcript

Herald added a reviewer: antiagainst. · View Herald Transcript

Herald added a reviewer: rriddle. · View Herald Transcript

Herald added a reviewer: antiagainst. · View Herald Transcript

Herald added a reviewer: uenoku. · View Herald Transcript

Herald added projects: Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project. · View Herald Transcript

Herald added subscribers: libc-commits, libcxx-commits, lldb-commits and 30 others. · View Herald Transcript

WTF phab?

Herald added a reviewer: mclow.lists. · View Herald TranscriptFeb 24 2020, 2:43 AM

Herald added a subscriber: • wuzish. · View Herald Transcript

teemperor removed subscribers: cfe-commits, lldb-commits, libcxx-commits, libc-commits.Feb 24 2020, 2:47 AM

This revision was not accepted when it landed; it landed in state Needs Review.Feb 24 2020, 2:48 AM

Closed by commit rGbc1947a6f51f: Add a basic tiling pass for parallel loops (authored by bkramer). · Explain Why

This revision was automatically updated to reflect the committed changes.

teemperor removed reviewers: • espindola, alexander-shaposhnikov, rupprecht, jhenderson, jdoerfert, sstefan1, mravishankar, antiagainst, rriddle, uenoku, mclow.lists.Feb 24 2020, 2:49 AM

teemperor removed projects: Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project.

Harbormaster failed remote builds in B47116: Diff 246166!Feb 24 2020, 3:19 AM

Harbormaster completed remote builds in B47117: Diff 246167.Feb 24 2020, 3:29 AM

rriddle added inline comments.Feb 24 2020, 9:09 AM

mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopTiling.cpp
113	nit: Please drop the trivial braces. Here and below.

Diff 245832

mlir/include/mlir/Dialect/LoopOps/Passes.h

	Show All 16 Lines

	namespace mlir {			namespace mlir {

	class Pass;			class Pass;

	/// Creates a loop fusion pass which fuses parallel loops.			/// Creates a loop fusion pass which fuses parallel loops.
	std::unique_ptr<Pass> createParallelLoopFusionPass();			std::unique_ptr<Pass> createParallelLoopFusionPass();

				/// Creates a pass which tiles innermost parallel loops.
				std::unique_ptr<Pass> createParallelLoopTilingPass(int64_t tileSize = 4);

	} // namespace mlir			} // namespace mlir

	#endif // MLIR_DIALECT_LOOPOPS_PASSES_H_			#endif // MLIR_DIALECT_LOOPOPS_PASSES_H_

mlir/include/mlir/InitAllPasses.h

Show First 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	#endif
createLinalgPromotionPass(0);		createLinalgPromotionPass(0);
createConvertLinalgToLoopsPass();		createConvertLinalgToLoopsPass();
createConvertLinalgToParallelLoopsPass();		createConvertLinalgToParallelLoopsPass();
createConvertLinalgToAffineLoopsPass();		createConvertLinalgToAffineLoopsPass();
createConvertLinalgToLLVMPass();		createConvertLinalgToLLVMPass();

// LoopOps		// LoopOps
createParallelLoopFusionPass();		createParallelLoopFusionPass();
		createParallelLoopTilingPass();

// QuantOps		// QuantOps
quant::createConvertSimulatedQuantPass();		quant::createConvertSimulatedQuantPass();
quant::createConvertConstPass();		quant::createConvertConstPass();
quantizer::createAddDefaultStatsPass();		quantizer::createAddDefaultStatsPass();
quantizer::createRemoveInstrumentationPass();		quantizer::createRemoveInstrumentationPass();
quantizer::registerInferQuantizedTypesPass();		quantizer::registerInferQuantizedTypesPass();

Show All 15 Lines

mlir/lib/Dialect/LoopOps/Transforms/CMakeLists.txt

	add_llvm_library(MLIRLoopOpsTransforms			add_llvm_library(MLIRLoopOpsTransforms
	ParallelLoopFusion.cpp			ParallelLoopFusion.cpp
				ParallelLoopTiling.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/LoopOps			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/LoopOps
	)			)

	target_link_libraries(MLIRLoopOpsTransforms			target_link_libraries(MLIRLoopOpsTransforms
	MLIRPass			MLIRPass
	MLIRLoopOps			MLIRLoopOps
	)			)

mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopTiling.cpp

This file was added.

				//===- ParallelLoopTiling.cpp - Tiles loop.parallel ---------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements loop tiling on parallel loops.
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Dialect/AffineOps/AffineOps.h"
				#include "mlir/Dialect/LoopOps/LoopOps.h"
				#include "mlir/Dialect/LoopOps/Passes.h"
				#include "mlir/Dialect/StandardOps/Ops.h"
				#include "mlir/Pass/Pass.h"
				#include "mlir/Transforms/RegionUtils.h"
				#include "llvm/Support/CommandLine.h"

				using namespace mlir;
				using loop::ParallelOp;

				/// Tile a parallel loop of the form
				/// loop.parallel (%i0, %i1) = (%arg0, %arg1) to (%arg2, %arg3)
				herhutUnsubmitted Done Reply Inline Actions Please use pass options. herhut: Please use pass options.
				/// step (%arg4, %arg5)
				///
				/// into
				/// loop.parallel (%i0, %i1) = (%arg0, %arg1) to (%arg2, %arg3)
				/// step (%arg4*tileSize,
				herhutUnsubmitted Done Reply Inline Actions nit: Doc comments need to start with `///` Use parallel loop instead of ploop. herhut: nit: Doc comments need to start with `///` Use parallel loop instead of ploop.
				/// %arg5*tileSize)
				/// loop.parallel (%i0, %i1) = (0, 0) to (min(%arg2, %arg0),
				/// min(%arg3, %arg1))
				/// step (%arg4, %arg5)
				herhutUnsubmitted Done Reply Inline Actions This comment does not state the actual rewriting, does it? herhut: This comment does not state the actual rewriting, does it?
				/// The old loop is replaced with the new one.
				static void tileParallelLoop(ParallelOp op, int64_t tileSize) {
				OpBuilder b(op);
				auto zero = b.create<ConstantIndexOp>(op.getLoc(), 0);
				auto tileSizeConstant = b.create<ConstantIndexOp>(op.getLoc(), tileSize);

				herhutUnsubmitted Done Reply Inline Actions mega-nit: `tileSizeConstants.reserve` herhut: mega-nit: `tileSizeConstants.reserve`
				// Create the outer loop with adjusted steps.
				dcaballeUnsubmitted Done Reply Inline Actions `size_t i = 0; i != op.upperBound().size();` -> `size_t i = 0, end = op.upperBound().size(); i != end;` ? dcaballe: `size_t i = 0; i != op.upperBound().size();` -> `size_t i = 0, end = op.upperBound().size(); i !
				SmallVector<Value, 2> newSteps;
				herhutUnsubmitted Done Reply Inline Actions tileParallelLoop herhut: tileParallelLoop
				newSteps.reserve(op.step().size());
				for (Value step : op.step()) {
				herhutUnsubmitted Done Reply Inline Actions Maybe `newSteps.reserve`? herhut: Maybe `newSteps.reserve`?
				newSteps.push_back(b.create<MulIOp>(op.getLoc(), step, tileSizeConstant));
				}
				auto outerLoop = b.create<ParallelOp>(op.getLoc(), op.lowerBound(),
				op.upperBound(), newSteps);
				b.setInsertionPointToStart(outerLoop.getBody());

				// Compute min(size, dim - offset) to avoid out-of-bounds accesses.
				// FIXME: Instead of using min, we want to replicate the tail. This would give
				herhutUnsubmitted Not Done Reply Inline Actions The replication part could be a separate pattern/pass that rewrites a loop with dynamic upper bounds specified with a constant upper bound (so an affine.min with a constant) into something like bound = affine.min(constant, other thing) if (bound ==constant) { // loop with constant upper bound } else { // loop with dynamic upper bound } Maybe this is good enough to make LLVM recognize the potential for vectorization. That rewrite could be applied independently. herhut: The replication part could be a separate pattern/pass that rewrites a loop with dynamic upper…
				// the inner loop constant bounds for easy vectorization.
				auto minMap = AffineMap::get(
				/dimCount=/3, /symbolCount=/0,
				{getAffineDimExpr(/position=/0, b.getContext()),
				getAffineDimExpr(/position=/1, b.getContext()) -
				getAffineDimExpr(/position=/2, b.getContext())});

				// Create the inner loop with adjusted bounds.
				SmallVector<Value, 2> newBounds;
				newBounds.reserve(op.upperBound().size());
				for (auto bounds : llvm::zip(op.upperBound(), op.lowerBound())) {
				newBounds.push_back(
				b.create<AffineMinOp>(op.getLoc(), b.getIndexType(), minMap,
				ValueRange{tileSizeConstant, std::get<0>(bounds),
				std::get<1>(bounds)}));
				}
				herhutUnsubmitted Done Reply Inline Actions I do not understand this part. We want the min of either the step size or dim - upperbound_of_outer. herhut: I do not understand this part. We want the min of either the step size or dim…
				herhutUnsubmitted Not Done Reply Inline Actions I always get confused by this myself. I think it needs to be `upper_bound - current_index_of_outer_loop` to get the number of remaining iterations. And then the min with the tiling size. herhut: I always get confused by this myself. I think it needs to be `upper_bound…
				auto innerLoop = b.create<ParallelOp>(
				op.getLoc(), SmallVector<Value, 2>(newBounds.size(), zero), newBounds,
				op.step());

				// Steal the body of the old parallel loop and erase it.
				innerLoop.region().takeBody(op.region());
				op.erase();
				}

				herhutUnsubmitted Done Reply Inline Actions you can do something like innerLoop.region().takeBody(op.region()) which simply transfers one region to the other. As the number of block arguments does not change, no need to fix uses or anything. herhut: you can do something like innerLoop.region().takeBody(op.region()) which simply transfers one…
				/// Get a list of most nested parallel loops. Assumes that ParallelOps are only
				/// directly nested.
				static bool getInnermostNestedLoops(Block *block,
				SmallVectorImpl<ParallelOp> &loops) {
				bool hasInnerLoop = false;
				for (auto parallelOp : block->getOps<ParallelOp>()) {
				hasInnerLoop = true;
				if (!getInnermostNestedLoops(parallelOp.getBody(), loops)) {
				loops.push_back(parallelOp);
				}
				}
				return hasInnerLoop;
				}

				namespace {
				struct ParallelLoopTiling : public FunctionPass<ParallelLoopTiling> {
				ParallelLoopTiling() = default;
				ParallelLoopTiling(const ParallelLoopTiling &) {} // tileSize is non-copyable.
				herhutUnsubmitted Done Reply Inline Actions `getBlock().getOps<loop::ParallelOp>()` gives an (potentially empty) iterator over all parallel loops in the block. herhut: `getBlock().getOps<loop::ParallelOp>()` gives an (potentially empty) iterator over all parallel…
				explicit ParallelLoopTiling(int64_t tileSize) { this->tileSize = tileSize; }
				herhutUnsubmitted Not Done Reply Inline Actions Can we have an ArrayRef<unsigned> for tile sizes? So we can have different ones at different levels? I would suspect that we, for vectorization, always want to tile 1x...x1xN. herhut: Can we have an ArrayRef<unsigned> for tile sizes? So we can have different ones at different…

				void runOnFunction() override {
				SmallVector<ParallelOp, 2> mostNestedParallelOps;
				for (Block &block : getFunction()) {
				getInnermostNestedLoops(&block, mostNestedParallelOps);
				}
				for (ParallelOp pLoop : mostNestedParallelOps) {
				tileParallelLoop(pLoop, tileSize);
				}
				}

				Option<int64_t> tileSize{
				*this, "parallel-loop-tile-size",
				llvm::cl::desc("factor to tile innermost parallel loops by"),
				llvm::cl::init(4)};
				};
				} // namespace
				rriddleUnsubmitted Not Done Reply Inline Actions nit: Please drop the trivial braces. Here and below. rriddle: nit: Please drop the trivial braces. Here and below.

				std::unique_ptr<Pass> mlir::createParallelLoopTilingPass(int64_t tileSize) {
				return std::make_unique<ParallelLoopTiling>(tileSize);
				dcaballeUnsubmitted Done Reply Inline Actions please drop {} for single line for and if-else statements dcaballe: please drop {} for single line for and if-else statements
				}

				static PassRegistration<ParallelLoopTiling> pass("parallel-loop-tiling",
				"Tile parallel loops.");

mlir/test/Dialect/Loops/parallel-loop-tiling.mlir

This file was added.

				// RUN: mlir-opt %s -pass-pipeline='func(parallel-loop-tiling)' -split-input-file \| FileCheck %s --dump-input-on-failure

				func @parallel_loop(%arg0 : index, %arg1 : index, %arg2 : index,
				%arg3 : index, %arg4 : index, %arg5 : index,
				%A: memref<?x?xf32>, %B: memref<?x?xf32>,
				%C: memref<?x?xf32>, %result: memref<?x?xf32>) {
				loop.parallel (%i0, %i1) = (%arg0, %arg1) to (%arg2, %arg3) step (%arg4, %arg5) {
				%B_elem = load %B[%i0, %i1] : memref<?x?xf32>
				%C_elem = load %C[%i0, %i1] : memref<?x?xf32>
				%sum_elem = addf %B_elem, %C_elem : f32
				store %sum_elem, %result[%i0, %i1] : memref<?x?xf32>
				}
				return
				}

				// CHECK: #map0 = affine_map<(d0, d1, d2) -> (d0, d1 - d2)>

				// CHECK-LABEL: func @parallel_loop(
				// CHECK-SAME: [[VAL_0:%.]]: index, [[VAL_1:%.]]: index, [[VAL_2:%.]]: index, [[VAL_3:%.]]: index, [[VAL_4:%.]]: index, [[VAL_5:%.]]: index, [[VAL_6:%.]]: memref<?x?xf32>, [[VAL_7:%.]]: memref<?x?xf32>, [[VAL_8:%.]]: memref<?x?xf32>, [[VAL_9:%.]]: memref<?x?xf32>) {
				// CHECK: [[VAL_10:%.*]] = constant 0 : index
				ftynseUnsubmitted Done Reply Inline Actions Please don't pattern-match SSA names, and generally prefer avoiding non-essential parts of the IR from the test (e.g. repeated types in the pattern). https://mlir.llvm.org/getting_started/TestingGuide/ ftynse: Please don't pattern-match SSA names, and generally prefer avoiding non-essential parts of the…
				// CHECK: [[VAL_11:%.*]] = constant 4 : index
				// CHECK: [[VAL_12:%.*]] = muli [[VAL_4]], [[VAL_11]] : index
				// CHECK: [[VAL_13:%.*]] = muli [[VAL_5]], [[VAL_11]] : index
				// CHECK: loop.parallel ([[VAL_14:%.]], [[VAL_15:%.]]) = ([[VAL_0]], [[VAL_1]]) to ([[VAL_2]], [[VAL_3]]) step ([[VAL_12]], [[VAL_13]]) {
				// CHECK: [[VAL_16:%.*]] = affine.min #map0([[VAL_11]], [[VAL_2]], [[VAL_0]])
				// CHECK: [[VAL_17:%.*]] = affine.min #map0([[VAL_11]], [[VAL_3]], [[VAL_1]])
				// CHECK: loop.parallel ([[VAL_18:%.]], [[VAL_19:%.]]) = ([[VAL_10]], [[VAL_10]]) to ([[VAL_16]], [[VAL_17]]) step ([[VAL_4]], [[VAL_5]]) {
				// CHECK: [[VAL_20:%.*]] = load [[VAL_7]]{{\[}}[[VAL_18]], [[VAL_19]]] : memref<?x?xf32>
				// CHECK: [[VAL_21:%.*]] = load [[VAL_8]]{{\[}}[[VAL_18]], [[VAL_19]]] : memref<?x?xf32>
				// CHECK: [[VAL_22:%.*]] = addf [[VAL_20]], [[VAL_21]] : f32
				// CHECK: store [[VAL_22]], [[VAL_9]]{{\[}}[[VAL_18]], [[VAL_19]]] : memref<?x?xf32>
				// CHECK: "loop.terminator"() : () -> ()
				// CHECK: }
				// CHECK: "loop.terminator"() : () -> ()
				// CHECK: }
				// CHECK: return
				// CHECK: }

				dcaballeUnsubmitted Not Done Reply Inline Actions Please add `// -----` between tests for split-input-file to work dcaballe: Please add `// -----` between tests for split-input-file to work

				func @tile_nested_innermost() {
				%c2 = constant 2 : index
				%c0 = constant 0 : index
				%c1 = constant 1 : index
				loop.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) {
				loop.parallel (%k, %l) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) {
				}
				}
				loop.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) {
				}
				return
				}

				// CHECK-LABEL: func @tile_nested_innermost() {
				// CHECK: [[VAL_23:%.*]] = constant 2 : index
				// CHECK: [[VAL_24:%.*]] = constant 0 : index
				// CHECK: [[VAL_25:%.*]] = constant 1 : index
				// CHECK: loop.parallel ([[VAL_26:%.]], [[VAL_27:%.]]) = ([[VAL_24]], [[VAL_24]]) to ([[VAL_23]], [[VAL_23]]) step ([[VAL_25]], [[VAL_25]]) {
				// CHECK: [[VAL_28:%.*]] = constant 0 : index
				// CHECK: [[VAL_29:%.*]] = constant 4 : index
				// CHECK: [[VAL_30:%.*]] = muli [[VAL_25]], [[VAL_29]] : index
				// CHECK: [[VAL_31:%.*]] = muli [[VAL_25]], [[VAL_29]] : index
				// CHECK: loop.parallel ([[VAL_32:%.]], [[VAL_33:%.]]) = ([[VAL_24]], [[VAL_24]]) to ([[VAL_23]], [[VAL_23]]) step ([[VAL_30]], [[VAL_31]]) {
				// CHECK: [[VAL_34:%.*]] = affine.min #map0([[VAL_29]], [[VAL_23]], [[VAL_24]])
				// CHECK: [[VAL_35:%.*]] = affine.min #map0([[VAL_29]], [[VAL_23]], [[VAL_24]])
				// CHECK: loop.parallel ([[VAL_36:%.]], [[VAL_37:%.]]) = ([[VAL_28]], [[VAL_28]]) to ([[VAL_34]], [[VAL_35]]) step ([[VAL_25]], [[VAL_25]]) {
				// CHECK: "loop.terminator"() : () -> ()
				// CHECK: }
				// CHECK: "loop.terminator"() : () -> ()
				// CHECK: }
				// CHECK: "loop.terminator"() : () -> ()
				// CHECK: }
				// CHECK: [[VAL_38:%.*]] = constant 0 : index
				// CHECK: [[VAL_39:%.*]] = constant 4 : index
				// CHECK: [[VAL_40:%.*]] = muli [[VAL_25]], [[VAL_39]] : index
				// CHECK: [[VAL_41:%.*]] = muli [[VAL_25]], [[VAL_39]] : index
				// CHECK: loop.parallel ([[VAL_42:%.]], [[VAL_43:%.]]) = ([[VAL_24]], [[VAL_24]]) to ([[VAL_23]], [[VAL_23]]) step ([[VAL_40]], [[VAL_41]]) {
				// CHECK: [[VAL_44:%.*]] = affine.min #map0([[VAL_39]], [[VAL_23]], [[VAL_24]])
				// CHECK: [[VAL_45:%.*]] = affine.min #map0([[VAL_39]], [[VAL_23]], [[VAL_24]])
				// CHECK: loop.parallel ([[VAL_46:%.]], [[VAL_47:%.]]) = ([[VAL_38]], [[VAL_38]]) to ([[VAL_44]], [[VAL_45]]) step ([[VAL_25]], [[VAL_25]]) {
				// CHECK: "loop.terminator"() : () -> ()
				// CHECK: }
				// CHECK: "loop.terminator"() : () -> ()
				// CHECK: }
				// CHECK: return
				// CHECK: }

This is an archive of the discontinued LLVM Phabricator instance.

Add a basic tiling pass for parallel loops
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 245832

mlir/include/mlir/Dialect/LoopOps/Passes.h

mlir/include/mlir/InitAllPasses.h

mlir/lib/Dialect/LoopOps/Transforms/CMakeLists.txt

mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopTiling.cpp

mlir/test/Dialect/Loops/parallel-loop-tiling.mlir

This is an archive of the discontinued LLVM Phabricator instance.

Add a basic tiling pass for parallel loopsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 245832

mlir/include/mlir/Dialect/LoopOps/Passes.h

mlir/include/mlir/InitAllPasses.h

mlir/lib/Dialect/LoopOps/Transforms/CMakeLists.txt

mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopTiling.cpp

mlir/test/Dialect/Loops/parallel-loop-tiling.mlir

Add a basic tiling pass for parallel loops
ClosedPublic