This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Analysis/
3/4
Utils.h
-
Dialect/Affine/IR/
-
Affine/
-
IR/
3/4
AffineOps.h
-
Transforms/
1/1
LoopFusionUtils.h
-
lib/
-
Analysis/
10/13
Utils.cpp
-
Dialect/Affine/IR/
-
Affine/
-
IR/
6/7
AffineOps.cpp
-
Transforms/
1/1
LoopFusion.cpp
-
Utils/
17/18
LoopFusionUtils.cpp
-
test/Transforms/
-
Transforms/
2/3
loop-fusion.mlir

Differential D104249

[mlir] Enable cleanup of single iteration reduction loops being sibling-fused maximally
ClosedPublic

Authored by sumesh13 on Jun 14 2021, 11:22 AM.

Download Raw Diff

Details

Reviewers

bondhugula
nicolasvasilache
vinayaka-polymage
dcaballe

Commits

rGada580863f89: [mlir] Enable cleanup of single iteration reduction loops being sibling-fused…

Summary

Changes include the following:

Single iteration reduction loops being sibling fused at innermost insertion level are skipped from being considered as sequential loops. Otherwise, the slice bounds of these loops is reset.
Promote loops that are skipped in previous step into outer loops.
Two utility function - buildSliceTripCountMap, getSliceIterationCount - are moved from

mlir/lib/Transforms/Utils/LoopFusionUtils.cpp to mlir/lib/Analysis/Utils.cpp

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

sumesh13 created this revision.Jun 14 2021, 11:22 AM

Herald added subscribers: dcaballe, cota, teijeong and 16 others. · View Herald TranscriptJun 14 2021, 11:22 AM

sumesh13 requested review of this revision.Jun 14 2021, 11:22 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptJun 14 2021, 11:22 AM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B109150: Diff 351940.Jun 14 2021, 12:26 PM

bondhugula added reviewers: vinayaka-polymage, dcaballe.Jun 15 2021, 10:24 AM

Thanks for this, I am posting comments based on a very superficial glance, I will go through it in more detail.

I have posted some minor inline comments.
It would be good to wrap the commit msg to the standard wrap width.

Also, moving some of the slice utils from `LoopFusionUtils.h looks good too.

mlir/include/mlir/Transforms/LoopFusionUtils.h
118	Nit: `isInnermostInsertionFusion` close backtick.
mlir/lib/Analysis/Utils.cpp
1112–1115	This can directly become a ternary operator ?

This revision now requires changes to proceed.Jun 16 2021, 9:17 AM

bondhugula requested changes to this revision.Jun 16 2021, 11:56 AM

bondhugula added inline comments.

mlir/include/mlir/Analysis/Utils.h
190	in the given slice <-- This is not really for a given generic slice but `given a slice trip count map` where the counts have already been determined to be constants.
mlir/include/mlir/Dialect/Affine/IR/AffineOps.h
407	`added` is confusing here. Please clarify as to whether it's appended or replaced. Or you can simply drop `added` to indicate they are completely replaced.
414	Nit: return values of the cloned loop ...
420	Nit: drop blank line.
mlir/lib/Analysis/Utils.cpp
1351	`!reductions.empty()`
mlir/lib/Dialect/Affine/IR/AffineOps.cpp
1899–1926	It's actually possible to clone the loop here without cloning the individual ops in the old loop's body. You can have the new loop just take the old loop's body - the old arguments just come along with the body. You need no remapping for the args nor does the IV need any remapping. The new block arguments can be appended and so there is no value remapping needed. With a move, the code becomes much shorter and this also saves a lot of reallocations for large loop bodies.
mlir/lib/Transforms/Utils/LoopFusionUtils.cpp
367–368	Reflow.

Address Code review comments

Thanks for the feedback. I have addressed the review comments. Please take a look.

Harbormaster completed remote builds in B110235: Diff 353414.Jun 21 2021, 10:57 AM

bondhugula added inline comments.Jun 21 2021, 11:15 PM

mlir/lib/Analysis/Utils.cpp
1108–1109	This is a stale comment - you may as well drop it in this revision since it's close to your changes.
1130–1131	Typo in this clause near "... and have been ...".
1351	No need of parentheses?
mlir/lib/Dialect/Affine/IR/AffineOps.cpp
1903	Nit: instruction -> operation
1909	I think you don't need to materialize the yield operands into a SmallVector - the operand_range will convert to a `ValueRange` and the builder should take a ValueRange.
mlir/lib/Transforms/Utils/LoopFusionUtils.cpp
370	Please return a `LogicalResult` (just like promoteIfSingleIteration).
419	I don't think you need `getOperations()` - just `block.splice` may work.
468	Drop blank line.

I think, in summary, attempt here is to fuse two loops that are reducing under sibling fusion, which is good. I think, the conditions presented in this patch need to be stricter to make it work. Here is a counter example (output results in incorrect code):

func @reduce_add_f32_f32(%arg0: memref<64x64xf32, 1>, %arg1: memref<1x64xf32, 1>, %arg2: memref<1x64xf32, 1>) {
  %cst_0 = constant 0.000000e+00 : f32
  %cst_1 = constant 1.000000e+00 : f32
  %0 = memref.alloca() : memref<f32, 1>
  %1 = memref.alloca() : memref<f32, 1>
  affine.for %arg3 = 0 to 1 {
    affine.for %arg4 = 0 to 64 {
      %accum = affine.for %arg5 = 0 to 64 iter_args (%prevAccum = %cst_0) -> f32 {
        %4 = affine.load %arg0[%arg5, %arg4] : memref<64x64xf32, 1>
        %5 = addf %prevAccum, %4 : f32
        affine.yield %5 : f32
      }
      %accum_dbl = addf %accum, %accum : f32
      affine.store %accum_dbl, %arg1[%arg3, %arg4] : memref<1x64xf32, 1>
    }
  }
  affine.for %arg3 = 0 to 1 {
    affine.for %arg4 = 0 to 64 {
      %accum = affine.for %arg5 = 0 to 32 iter_args (%prevAccum = %cst_1) -> f32 { // modified the test case on this line 64 => 32
        %4 = affine.load %arg0[%arg5, %arg4] : memref<64x64xf32, 1>
        %5 = mulf %prevAccum, %4 : f32
        affine.yield %5 : f32
      }
      %accum_sqr = mulf %accum, %accum : f32
      affine.store %accum_sqr, %arg2[%arg3, %arg4] : memref<1x64xf32, 1>
    }
  }
  return
}

Result with the proposed change will be :

module  {
  func @reduce_add_f32_f32(%arg0: memref<64x64xf32, 1>, %arg1: memref<1x64xf32, 1>, %arg2: memref<1x64xf32, 1>) {
    %c0 = constant 0 : index
    %cst = constant 0.000000e+00 : f32
    %cst_0 = constant 1.000000e+00 : f32
    %0 = memref.alloca() : memref<f32, 1>
    %1 = memref.alloca() : memref<f32, 1>
    affine.for %arg3 = 0 to 1 {
      affine.for %arg4 = 0 to 64 {
        %2:2 = affine.for %arg5 = 0 to 32 iter_args(%arg6 = %cst_0, %arg7 = %cst) -> (f32, f32) {
          %5 = affine.load %arg0[%arg5, %arg4] : memref<64x64xf32, 1>
          %6 = addf %arg7, %5 : f32
          %7 = affine.load %arg0[%arg5, %arg4] : memref<64x64xf32, 1>
          %8 = mulf %arg6, %7 : f32
          affine.yield %8, %6 : f32, f32
        }
        %3 = addf %2#1, %2#1 : f32
        affine.store %3, %arg1[%c0, %arg4] : memref<1x64xf32, 1>
        %4 = mulf %2#0, %2#0 : f32
        affine.store %4, %arg2[%arg3, %arg4] : memref<1x64xf32, 1>
      }
    }
    return
  }
}

Note that reduction on addition was supposed to happen on 64 elements, but is now happening only on 32.

This revision now requires changes to proceed.Jun 22 2021, 3:42 AM

Address review comments and add check to only do cleanup if the loop trip counts match.

Thansk for the additional comments.

In D104249#2832763, @vinayaka-polymage wrote:

func @reduce_add_f32_f32(%arg0: memref<64x64xf32, 1>, %arg1: memref<1x64xf32, 1>, %arg2: memref<1x64xf32, 1>) {
  %cst_0 = constant 0.000000e+00 : f32
  %cst_1 = constant 1.000000e+00 : f32
  %0 = memref.alloca() : memref<f32, 1>
  %1 = memref.alloca() : memref<f32, 1>
  affine.for %arg3 = 0 to 1 {
    affine.for %arg4 = 0 to 64 {
      %accum = affine.for %arg5 = 0 to 64 iter_args (%prevAccum = %cst_0) -> f32 {
        %4 = affine.load %arg0[%arg5, %arg4] : memref<64x64xf32, 1>
        %5 = addf %prevAccum, %4 : f32
        affine.yield %5 : f32
      }
      %accum_dbl = addf %accum, %accum : f32
      affine.store %accum_dbl, %arg1[%arg3, %arg4] : memref<1x64xf32, 1>
    }
  }
  affine.for %arg3 = 0 to 1 {
    affine.for %arg4 = 0 to 64 {
      %accum = affine.for %arg5 = 0 to 32 iter_args (%prevAccum = %cst_1) -> f32 { // modified the test case on this line 64 => 32
        %4 = affine.load %arg0[%arg5, %arg4] : memref<64x64xf32, 1>
        %5 = mulf %prevAccum, %4 : f32
        affine.yield %5 : f32
      }
      %accum_sqr = mulf %accum, %accum : f32
      affine.store %accum_sqr, %arg2[%arg3, %arg4] : memref<1x64xf32, 1>
    }
  }
  return
}

Result with the proposed change will be :

module  {
  func @reduce_add_f32_f32(%arg0: memref<64x64xf32, 1>, %arg1: memref<1x64xf32, 1>, %arg2: memref<1x64xf32, 1>) {
    %c0 = constant 0 : index
    %cst = constant 0.000000e+00 : f32
    %cst_0 = constant 1.000000e+00 : f32
    %0 = memref.alloca() : memref<f32, 1>
    %1 = memref.alloca() : memref<f32, 1>
    affine.for %arg3 = 0 to 1 {
      affine.for %arg4 = 0 to 64 {
        %2:2 = affine.for %arg5 = 0 to 32 iter_args(%arg6 = %cst_0, %arg7 = %cst) -> (f32, f32) {
          %5 = affine.load %arg0[%arg5, %arg4] : memref<64x64xf32, 1>
          %6 = addf %arg7, %5 : f32
          %7 = affine.load %arg0[%arg5, %arg4] : memref<64x64xf32, 1>
          %8 = mulf %arg6, %7 : f32
          affine.yield %8, %6 : f32, f32
        }
        %3 = addf %2#1, %2#1 : f32
        affine.store %3, %arg1[%c0, %arg4] : memref<1x64xf32, 1>
        %4 = mulf %2#0, %2#0 : f32
        affine.store %4, %arg2[%arg3, %arg4] : memref<1x64xf32, 1>
      }
    }
    return
  }
}

Note that reduction on addition was supposed to happen on 64 elements, but is now happening only on 32.

Thanks for pointing this out. I have introduced an additional check. Would this suffice ?

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
1909	Can you pls clarify? am also appending the operands of the yeild operation from the inner for loop.
mlir/lib/Transforms/Utils/LoopFusionUtils.cpp
419	I see there is no splice in block itself.

Harbormaster completed remote builds in B110687: Diff 354049.Jun 23 2021, 12:42 PM

bondhugula added inline comments.Jun 23 2021, 10:51 PM

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
1909	I missed that - sorry. It should be fine then.

@sumesh13 Could you also please mention whether some of the methods were just moved without any change? That's useful in the commit summary.

mlir/lib/Transforms/Utils/LoopFusionUtils.cpp
388–390	Reflow - please use whole width. instruction -> operation

bondhugula added inline comments.Jun 23 2021, 11:02 PM

mlir/include/mlir/Dialect/Affine/IR/AffineOps.h
417	After the change to `takeBody`, this would no longer be a clone! The input operation `loop` is no longer the same and will have no body. We should document this and the name of this method should ideally change. Do you want to erase `loop` here as well here? In that case this can be called: `replaceWithNewYields`?

bondhugula added inline comments.Jun 24 2021, 5:59 AM

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
1878	Since this is exposed in `mlir::`, you'll need to name it something like `cloneForOpWithNewYields`.

Address code review comments including rename API to replaceForOpWithNewYields

In D104249#2837879, @bondhugula wrote:

@sumesh13 Could you also please mention whether some of the methods were just moved without any change? That's useful in the commit summary.

Thanks for all the comments. Updated commit message and have addressed other comments.

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
1878	Thanks for pointing that out. I have renamed the function to now replaceForOpWithNewYields.

Harbormaster completed remote builds in B110859: Diff 354299.Jun 24 2021, 11:06 AM

Reflow a comment

Harbormaster completed remote builds in B110876: Diff 354322.Jun 24 2021, 12:23 PM

Thanks for addressing comments, sorry for getting back on this so late.

I have added requests for clarifications in a few places. Can you please clarify those ?

mlir/include/mlir/Analysis/Utils.h
52	This function actually requires the outer loop to be parallel, but also to contain reductions inside it, correct ?
mlir/lib/Analysis/Utils.cpp
1137	You can directly use `isMaximal` here and update the comment above.
1346	Added a comment above. I might be confused here, but it seems like the function checks more that whether the loop is a reduction loop or not.
mlir/lib/Transforms/Utils/LoopFusionUtils.cpp
369	I think, such a merge of an inner loop should happen into its parent loop, when the bounds exactly match. Is this already being taken care of somewhere else ?
401	As ops are being moved outside, care must be taken to make sure that no dependences are being violated. In sibling fusion, it is OK, as there are read-read dependences. So, can you please clarify if such a promotion of loops should happen only in sibling fusions ?
433	Depending on the question above, if promotion of reducing loops should happen only in sibling fusion, the option here needs to be renamed.

Address comments

@vinayaka-polymage Thanks for additional comments. Can you pls check if my clarifications address your concerns?

mlir/include/mlir/Analysis/Utils.h
52	No, this check only does a limited if a loop is reduction. The caller only calls it for loops determined to be sequential. Pls let me know if its not clear and some clarification might be helpful.
52	No, the check is specifcially if its a reduction loop. There is outer logic i utils.cpp that is only invoking it for sequential loops. Similarly while fusing, its only needed to check if the particular loop is reduction. Please let me know if there is clarifciation that can help here.
mlir/lib/Analysis/Utils.cpp
1346	Addressed the comment before.
mlir/lib/Transforms/Utils/LoopFusionUtils.cpp
369	The loop bounds are reset earlier if the bounds do not match. And in this case the merge is done only for unit slice. Do you still see it necessary to check the bounds?
401	Yeah, I have renamed the option in the caller.

Add an extra clarification for when the promotion API can be safely used.

sumesh13 marked an inline comment as done.Jun 28 2021, 11:58 AM

sumesh13 added inline comments.

mlir/lib/Transforms/Utils/LoopFusionUtils.cpp
369	I have added a clarification.

Harbormaster completed remote builds in B111338: Diff 354974.Jun 28 2021, 12:25 PM

bondhugula added inline comments.Jul 3 2021, 10:17 AM

mlir/test/Transforms/loop-fusion.mlir
3153	This whole test case has become too long (3144 lines) and we should really split this file - perhaps write after this PR in an NFC PR.

bondhugula requested changes to this revision.Jul 3 2021, 10:25 AM

bondhugula added inline comments.

mlir/lib/Transforms/Utils/LoopFusionUtils.cpp
368–369	This doesn't look really clean to have this method named and implemented this way with a generic single argument, and then having a comment and TODO stating this only should be used for sibling fusion. If we do this, we should have a boolean argument `siblingFusion` set to true/false and it should work for both cases.
369	Reflow to use the whole line - here and anywhere else. Please make a sweep as reviewers really can't comment everywhere.
401–402	We shouldn't be having such TODOs - we should address this via a proper function signature.

This revision now requires changes to proceed.Jul 3 2021, 10:25 AM

Address comments

Reflow all coments
- Additional argument to promoteSingleIterReduction to control when operations are moved out

sumesh13 marked 2 inline comments as done and an inline comment as not done.Jul 6 2021, 12:46 PM

sumesh13 added inline comments.

mlir/test/Transforms/loop-fusion.mlir
3153	Sure...I can move the purely sibling fusion related tests to a new file.

Harbormaster completed remote builds in B112674: Diff 356784.Jul 6 2021, 1:22 PM

Thanks for addressing the comments I had. A few more minor comments. I'll just defer to @vinayaka-polymage for the other logic part. Please do wait for his acceptance before merging.

mlir/lib/Transforms/LoopFusion.cpp
1782–1785	This change requires a comment.
mlir/lib/Transforms/Utils/LoopFusionUtils.cpp
382–383	You don't need a `static_cast` here.
416	Nit: Use `auto *it` to avoid clang-tidy warning.
mlir/test/Transforms/loop-fusion.mlir
3265	This trailing comment isn't clear. Please use a separate line and rephrase a bit.

bondhugula accepted this revision.Jul 8 2021, 7:54 PM

Adrress review comments

sumesh13 marked 6 inline comments as done.Jul 8 2021, 11:02 PM

sumesh13 added inline comments.

mlir/lib/Transforms/Utils/LoopFusionUtils.cpp
416	Didnt see a clang-tidy warning in this case.

Harbormaster completed remote builds in B113135: Diff 357419.Jul 8 2021, 11:32 PM

Changes look good now, minor clarification request.

mlir/lib/Analysis/Utils.cpp
1346	Minor: I think, the function checks that the loop `forOp` is parallel and also contains parallel reductions loops. Line 1348 checks that the outer loop is parallel. Can you please clarify? A sequential outer loop containing a reduction loop inside it will result in `false` when passed to this function, right ?

sumesh13 added inline comments.Jul 11 2021, 10:35 PM

mlir/lib/Analysis/Utils.cpp
1346	Yes, the intent here was to say that the loop if not for the reduction is parallel and hence need not have its bounds reset. So a sequential outer loop with a reduction inner loop will result in false. Pls let me know if you see an opportunity to broaden the case.

vinayaka-polymage added inline comments.Jul 12 2021, 7:06 PM

mlir/lib/Analysis/Utils.cpp
1346	OK, so in that case, function name and its description in comments will have to change. As this is a utility, anyone might use it in future for detecting reduction loops. If the check is a custom one, name and description will have to reflect that. Does this make sense ?

sumesh13 added inline comments.Jul 12 2021, 7:18 PM

mlir/lib/Analysis/Utils.cpp
1346	Sure..I think that makes sense. Any suggestion? How about isNonSequentialDueToReduction?

Suggestion

mlir/lib/Analysis/Utils.cpp
1346	My suggestion is `isLoopParallelAndContainsReductions`.

isReductionLoop->isLoopParallelAndContainsReductionLoop

sumesh13 marked an inline comment as done.Jul 12 2021, 10:29 PM

Harbormaster completed remote builds in B113660: Diff 358157.Jul 12 2021, 11:02 PM

Thanks for addressing all the comments, I will make a final pass on it in a day.

Thanks for addressing all the comments, sorry once again for taking so long to review. As some ops were moving out from one for op to its parent, I wanted to closely review the scenarios.

I was mainly considering cases with bounds that don't align well, like the following example, fusion produces incorrect code (even without changes suggested here). So, I think, we should separately address them, they are not introduced by this change.

func @reduce_add_f32_f32(%arg0: memref<64x64xf32, 1>, %arg1: memref<1x63xf32, 1>, %arg2: memref<1x64xf32, 1>) {
  %cst_0 = constant 0.000000e+00 : f32
  %cst_1 = constant 1.000000e+00 : f32
  %0 = memref.alloca() : memref<f32, 1>
  %1 = memref.alloca() : memref<f32, 1>
  affine.for %arg3 = 0 to 1 {
    affine.for %arg4 = 0 to 63 {
      %accum = affine.for %arg5 = 0 to 64 iter_args (%prevAccum = %cst_0) -> f32 {
        %4 = affine.load %arg0[%arg5, %arg4] : memref<64x64xf32, 1>
        %5 = addf %prevAccum, %4 : f32
        affine.yield %5 : f32
      }
      %accum_dbl = addf %accum, %accum : f32
      affine.store %accum_dbl, %arg1[%arg3, %arg4] : memref<1x63xf32, 1>
    }
  }
  affine.for %arg3 = 0 to 1 {
    affine.for %arg4 = 0 to 32 {
      %accum = affine.for %arg5 = 0 to 64 iter_args (%prevAccum = %cst_1) -> f32 {
        %4 = affine.load %arg0[%arg5, %arg4] : memref<64x64xf32, 1>
        %5 = mulf %prevAccum, %4 : f32
        affine.yield %5 : f32
      }
      %accum_sqr = mulf %accum, %accum : f32
      affine.store %accum_sqr, %arg2[%arg3, %arg4] : memref<1x64xf32, 1>
    }
  }
  return
}

This set of changes look good to me!

This revision is now accepted and ready to land.Jul 14 2021, 10:49 AM

Thanks @bondhugula @vinayaka-polymage for the review and helpful comments!!

FIx a clang-format issue

Fix a clang-format issue

Harbormaster completed remote builds in B114051: Diff 358688.Jul 14 2021, 1:09 PM

One asan fix

Harbormaster completed remote builds in B114295: Diff 359046.Jul 15 2021, 12:43 PM

Closed by commit rGada580863f89: [mlir] Enable cleanup of single iteration reduction loops being sibling-fused… (authored by sumesh13). · Explain WhyJul 15 2021, 2:07 PM

This revision was automatically updated to reflect the committed changes.

sumesh13 added a commit: rGada580863f89: [mlir] Enable cleanup of single iteration reduction loops being sibling-fused….

Revision Contents

Path

Size

mlir/

include/

mlir/

Analysis/

Utils.h

15 lines

Dialect/

Affine/

IR/

AffineOps.h

14 lines

Transforms/

LoopFusionUtils.h

9 lines

lib/

Analysis/

Utils.cpp

102 lines

Dialect/

Affine/

IR/

AffineOps.cpp

44 lines

Transforms/

LoopFusion.cpp

9 lines

Utils/

LoopFusionUtils.cpp

153 lines

test/

Transforms/

loop-fusion.mlir

137 lines

Diff 358688

mlir/include/mlir/Analysis/Utils.h

	Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	/// outermost `affine.for` or `affine.if` operation to the innermost one.			/// outermost `affine.for` or `affine.if` operation to the innermost one.
	void getEnclosingAffineForAndIfOps(Operation &op,			void getEnclosingAffineForAndIfOps(Operation &op,
	SmallVectorImpl<Operation > ops);			SmallVectorImpl<Operation > ops);

	/// Returns the nesting depth of this operation, i.e., the number of loops			/// Returns the nesting depth of this operation, i.e., the number of loops
	/// surrounding this operation.			/// surrounding this operation.
	unsigned getNestingDepth(Operation *op);			unsigned getNestingDepth(Operation *op);

				/// Returns whether a loop is a parallel loop and contains a reduction loop.
				vinayaka-polymageUnsubmitted Not Done Reply Inline Actions This function actually requires the outer loop to be parallel, but also to contain reductions inside it, correct ? vinayaka-polymage: This function actually requires the outer loop to be parallel, but also to contain reductions…
				sumesh13AuthorUnsubmitted Done Reply Inline Actions No, this check only does a limited if a loop is reduction. The caller only calls it for loops determined to be sequential. Pls let me know if its not clear and some clarification might be helpful. sumesh13: No, this check only does a limited if a loop is reduction. The caller only calls it for loops…
				sumesh13AuthorUnsubmitted Done Reply Inline Actions No, the check is specifcially if its a reduction loop. There is outer logic i utils.cpp that is only invoking it for sequential loops. Similarly while fusing, its only needed to check if the particular loop is reduction. Please let me know if there is clarifciation that can help here. sumesh13: No, the check is specifcially if its a reduction loop. There is outer logic i utils.cpp that is…
				bool isLoopParallelAndContainsReduction(AffineForOp forOp);

	/// Returns in 'sequentialLoops' all sequential loops in loop nest rooted			/// Returns in 'sequentialLoops' all sequential loops in loop nest rooted
	/// at 'forOp'.			/// at 'forOp'.
	void getSequentialLoops(AffineForOp forOp,			void getSequentialLoops(AffineForOp forOp,
	llvm::SmallDenseSet<Value, 8> *sequentialLoops);			llvm::SmallDenseSet<Value, 8> *sequentialLoops);

	/// Enumerates different result statuses of slice computation by			/// Enumerates different result statuses of slice computation by
	/// `computeSliceUnion`			/// `computeSliceUnion`
	// TODO: Identify and add different kinds of failures during slice computation.			// TODO: Identify and add different kinds of failures during slice computation.
	▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines
	// %v = affine.load %0[%i1] : memref<100xf32> // 'depSinkAccess'			// %v = affine.load %0[%i1] : memref<100xf32> // 'depSinkAccess'
	// }			// }
	//			//
	void getComputationSliceState(Operation depSourceOp, Operation depSinkOp,			void getComputationSliceState(Operation depSourceOp, Operation depSinkOp,
	FlatAffineConstraints *dependenceConstraints,			FlatAffineConstraints *dependenceConstraints,
	unsigned loopDepth, bool isBackwardSlice,			unsigned loopDepth, bool isBackwardSlice,
	ComputationSliceState *sliceState);			ComputationSliceState *sliceState);

				/// Return the number of iterations for the `slicetripCountMap` provided.
				bondhugulaUnsubmitted Done Reply Inline Actions in the given slice <-- This is not really for a given generic slice but `given a slice trip count map` where the counts have already been determined to be constants. bondhugula: in the given slice <-- This is not really for a given generic slice but `given a slice trip…
				uint64_t getSliceIterationCount(
				const llvm::SmallDenseMap<Operation *, uint64_t, 8> &sliceTripCountMap);

				/// Builds a map 'tripCountMap' from AffineForOp to constant trip count for
				/// loop nest surrounding represented by slice loop bounds in 'slice'. Returns
				/// true on success, false otherwise (if a non-constant trip count was
				/// encountered).
				bool buildSliceTripCountMap(
				const ComputationSliceState &slice,
				llvm::SmallDenseMap<Operation , uint64_t, 8> tripCountMap);

	/// Computes in 'sliceUnion' the union of all slice bounds computed at			/// Computes in 'sliceUnion' the union of all slice bounds computed at
	/// 'loopDepth' between all dependent pairs of ops in 'opsA' and 'opsB', and			/// 'loopDepth' between all dependent pairs of ops in 'opsA' and 'opsB', and
	/// then verifies if it is valid. The parameter 'numCommonLoops' is the number			/// then verifies if it is valid. The parameter 'numCommonLoops' is the number
	/// of loops common to the operations in 'opsA' and 'opsB'. If 'isBackwardSlice'			/// of loops common to the operations in 'opsA' and 'opsB'. If 'isBackwardSlice'
	/// is true, computes slice bounds for loop nest surrounding ops in 'opsA', as a			/// is true, computes slice bounds for loop nest surrounding ops in 'opsA', as a
	/// function of IVs and symbols of loop nest surrounding ops in 'opsB' at			/// function of IVs and symbols of loop nest surrounding ops in 'opsB' at
	/// 'loopDepth'. If 'isBackwardSlice' is false, computes slice bounds for loop			/// 'loopDepth'. If 'isBackwardSlice' is false, computes slice bounds for loop
	/// nest surrounding ops in 'opsB', as a function of IVs and symbols of loop			/// nest surrounding ops in 'opsB', as a function of IVs and symbols of loop
	▲ Show 20 Lines • Show All 176 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Affine/IR/AffineOps.h

Show First 20 Lines • Show All 398 Lines • ▼ Show 20 Lines	void buildAffineLoopNest(OpBuilder &builder, Location loc,
ArrayRef<int64_t> steps,		ArrayRef<int64_t> steps,
function_ref<void(OpBuilder &, Location, ValueRange)>		function_ref<void(OpBuilder &, Location, ValueRange)>
bodyBuilderFn = nullptr);		bodyBuilderFn = nullptr);
void buildAffineLoopNest(OpBuilder &builder, Location loc, ValueRange lbs,		void buildAffineLoopNest(OpBuilder &builder, Location loc, ValueRange lbs,
ValueRange ubs, ArrayRef<int64_t> steps,		ValueRange ubs, ArrayRef<int64_t> steps,
function_ref<void(OpBuilder &, Location, ValueRange)>		function_ref<void(OpBuilder &, Location, ValueRange)>
bodyBuilderFn = nullptr);		bodyBuilderFn = nullptr);

		/// Replace `loop` with a new loop where `newIterOperands` are appended with
		bondhugulaUnsubmitted Done Reply Inline Actions `added` is confusing here. Please clarify as to whether it's appended or replaced. Or you can simply drop `added` to indicate they are completely replaced. bondhugula: `added` is confusing here. Please clarify as to whether it's appended or replaced. Or you can…
		/// new initialization values and `newYieldedValues` are added as new yielded
		/// values. The returned ForOp has `newYieldedValues.size()` new result values.
		/// Additionally, if `replaceLoopResults` is true, all uses of
		/// `loop.getResults()` are replaced with the first `loop.getNumResults()`
		/// return values of the original loop respectively. The original loop is
		/// deleted and the new loop returned.
		/// Prerequisite: `newIterOperands.size() == newYieldedValues.size()`.
		bondhugulaUnsubmitted Done Reply Inline Actions Nit: return values of the cloned loop ... bondhugula: Nit: return values of the cloned loop ...
		AffineForOp replaceForOpWithNewYields(OpBuilder &b, AffineForOp loop,
		ValueRange newIterOperands,
		ValueRange newYieldedValues,
		bondhugulaUnsubmitted Not Done Reply Inline Actions After the change to `takeBody`, this would no longer be a clone! The input operation `loop` is no longer the same and will have no body. We should document this and the name of this method should ideally change. Do you want to erase `loop` here as well here? In that case this can be called: `replaceWithNewYields`? bondhugula: After the change to `takeBody`, this would no longer be a clone! The input operation `loop` is…
		ValueRange newIterArgs,
		bool replaceLoopResults = true);

		bondhugulaUnsubmitted Done Reply Inline Actions Nit: drop blank line. bondhugula: Nit: drop blank line.
/// AffineBound represents a lower or upper bound in the for operation.		/// AffineBound represents a lower or upper bound in the for operation.
/// This class does not own the underlying operands. Instead, it refers		/// This class does not own the underlying operands. Instead, it refers
/// to the operands stored in the AffineForOp. Its life span should not exceed		/// to the operands stored in the AffineForOp. Its life span should not exceed
/// that of the for operation it refers to.		/// that of the for operation it refers to.
class AffineBound {		class AffineBound {
public:		public:
AffineForOp getAffineForOp() { return op; }		AffineForOp getAffineForOp() { return op; }
AffineMap getMap() { return map; }		AffineMap getMap() { return map; }
Show All 29 Lines

mlir/include/mlir/Transforms/LoopFusionUtils.h

	Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines
	/// NOTE: This function is not feature complete and should only be used in			/// NOTE: This function is not feature complete and should only be used in
	/// testing.			/// testing.
	/// TODO: Update comments when this function is fully implemented.			/// TODO: Update comments when this function is fully implemented.
	FusionResult			FusionResult
	canFuseLoops(AffineForOp srcForOp, AffineForOp dstForOp, unsigned dstLoopDepth,			canFuseLoops(AffineForOp srcForOp, AffineForOp dstForOp, unsigned dstLoopDepth,
	ComputationSliceState *srcSlice,			ComputationSliceState *srcSlice,
	FusionStrategy fusionStrategy = FusionStrategy::Generic);			FusionStrategy fusionStrategy = FusionStrategy::Generic);

	/// Fuses 'srcForOp' into 'dstForOp' with destination loop block insertion point			/// Fuses 'srcForOp' into 'dstForOp' with destination loop block insertion
	/// and source slice loop bounds specified in 'srcSlice'.			/// point and source slice loop bounds specified in 'srcSlice'.
				vinayaka-polymageUnsubmitted Done Reply Inline Actions Nit: `isInnermostInsertionFusion` close backtick. vinayaka-polymage: Nit: `isInnermostInsertionFusion` close backtick.
				/// `isInnermostSiblingInsertionFusion` enables cleanup of `srcForOp that is a
				/// single-iteration reduction loop being sibling-fused into a 'dstForOp'.
	void fuseLoops(AffineForOp srcForOp, AffineForOp dstForOp,			void fuseLoops(AffineForOp srcForOp, AffineForOp dstForOp,
	const ComputationSliceState &srcSlice);			const ComputationSliceState &srcSlice,
				bool isInnermostSiblingInsertionFusion = false);

	/// LoopNestStats aggregates various per-loop statistics (eg. loop trip count			/// LoopNestStats aggregates various per-loop statistics (eg. loop trip count
	/// and operation count) for a loop nest up until (and including) the innermost			/// and operation count) for a loop nest up until (and including) the innermost
	/// loop body.			/// loop body.
	struct LoopNestStats {			struct LoopNestStats {
	/// Map from AffineForOp to immediate child AffineForOps in its loop body.			/// Map from AffineForOp to immediate child AffineForOps in its loop body.
	DenseMap<Operation *, SmallVector<AffineForOp, 2>> loopMap;			DenseMap<Operation *, SmallVector<AffineForOp, 2>> loopMap;
	/// Map from AffineForOp to count of operations in its loop body.			/// Map from AffineForOp to count of operations in its loop body.
	Show All 39 Lines

mlir/lib/Analysis/Utils.cpp

//===- Utils.cpp ---- Misc utilities for analysis -------------------------===//		//===- Utils.cpp ---- Misc utilities for analysis -------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements miscellaneous analysis routines for non-loop IR		// This file implements miscellaneous analysis routines for non-loop IR
// structures.		// structures.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Analysis/Utils.h"		#include "mlir/Analysis/Utils.h"
#include "mlir/Analysis/AffineAnalysis.h"		#include "mlir/Analysis/AffineAnalysis.h"
		#include "mlir/Analysis/LoopAnalysis.h"
#include "mlir/Analysis/PresburgerSet.h"		#include "mlir/Analysis/PresburgerSet.h"
#include "mlir/Dialect/Affine/IR/AffineOps.h"		#include "mlir/Dialect/Affine/IR/AffineOps.h"
#include "mlir/Dialect/Affine/IR/AffineValueMap.h"		#include "mlir/Dialect/Affine/IR/AffineValueMap.h"
#include "mlir/Dialect/StandardOps/IR/Ops.h"		#include "mlir/Dialect/StandardOps/IR/Ops.h"
#include "mlir/IR/IntegerSet.h"		#include "mlir/IR/IntegerSet.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
▲ Show 20 Lines • Show All 939 Lines • ▼ Show 20 Lines	if (!isSliceValid.hasValue()) {
return SliceComputationResult::GenericFailure;		return SliceComputationResult::GenericFailure;
}		}
if (!isSliceValid.getValue())		if (!isSliceValid.getValue())
return SliceComputationResult::IncorrectSliceFailure;		return SliceComputationResult::IncorrectSliceFailure;

return SliceComputationResult::Success;		return SliceComputationResult::Success;
}		}

		// TODO: extend this to handle multiple result maps.
		static Optional<uint64_t> getConstDifference(AffineMap lbMap, AffineMap ubMap) {
		assert(lbMap.getNumResults() == 1 && "expected single result bound map");
		assert(ubMap.getNumResults() == 1 && "expected single result bound map");
		assert(lbMap.getNumDims() == ubMap.getNumDims());
		assert(lbMap.getNumSymbols() == ubMap.getNumSymbols());
		AffineExpr lbExpr(lbMap.getResult(0));
		AffineExpr ubExpr(ubMap.getResult(0));
		auto loopSpanExpr = simplifyAffineExpr(ubExpr - lbExpr, lbMap.getNumDims(),
		lbMap.getNumSymbols());
		auto cExpr = loopSpanExpr.dyn_cast<AffineConstantExpr>();
		if (!cExpr)
		return None;
		return cExpr.getValue();
		}

		// Builds a map 'tripCountMap' from AffineForOp to constant trip count for loop
		// nest surrounding represented by slice loop bounds in 'slice'. Returns true
		// on success, false otherwise (if a non-constant trip count was encountered).
		// TODO: Make this work with non-unit step loops.
		bool mlir::buildSliceTripCountMap(
		const ComputationSliceState &slice,
		llvm::SmallDenseMap<Operation , uint64_t, 8> tripCountMap) {
		unsigned numSrcLoopIVs = slice.ivs.size();
		// Populate map from AffineForOp -> trip count
		for (unsigned i = 0; i < numSrcLoopIVs; ++i) {
		AffineForOp forOp = getForInductionVarOwner(slice.ivs[i]);
		auto *op = forOp.getOperation();
		AffineMap lbMap = slice.lbs[i];
		AffineMap ubMap = slice.ubs[i];
		// If lower or upper bound maps are null or provide no results, it implies
		// that source loop was not at all sliced, and the entire loop will be a
		// part of the slice.
		if (!lbMap \|\| lbMap.getNumResults() == 0 \|\| !ubMap \|\|
		ubMap.getNumResults() == 0) {
		// The iteration of src loop IV 'i' was not sliced. Use full loop bounds.
		if (forOp.hasConstantLowerBound() && forOp.hasConstantUpperBound()) {
		(*tripCountMap)[op] =
		forOp.getConstantUpperBound() - forOp.getConstantLowerBound();
		continue;
		}
		Optional<uint64_t> maybeConstTripCount = getConstantTripCount(forOp);
		if (maybeConstTripCount.hasValue()) {
		(*tripCountMap)[op] = maybeConstTripCount.getValue();
		continue;
		}
		return false;
		}
		Optional<uint64_t> tripCount = getConstDifference(lbMap, ubMap);
		// Slice bounds are created with a constant ub - lb difference.
		if (!tripCount.hasValue())
		return false;
		(*tripCountMap)[op] = tripCount.getValue();
		}
		return true;
		}

		// Return the number of iterations in the given slice.
		uint64_t mlir::getSliceIterationCount(
		const llvm::SmallDenseMap<Operation *, uint64_t, 8> &sliceTripCountMap) {
		uint64_t iterCount = 1;
		for (const auto &count : sliceTripCountMap) {
		iterCount *= count.second;
		}
		return iterCount;
		}

const char *const kSliceFusionBarrierAttrName = "slice_fusion_barrier";		const char *const kSliceFusionBarrierAttrName = "slice_fusion_barrier";
// Computes slice bounds by projecting out any loop IVs from		// Computes slice bounds by projecting out any loop IVs from
// 'dependenceConstraints' at depth greater than 'loopDepth', and computes slice		// 'dependenceConstraints' at depth greater than 'loopDepth', and computes slice
// bounds in 'sliceState' which represent the one loop nest's IVs in terms of		// bounds in 'sliceState' which represent the one loop nest's IVs in terms of
// the other loop nest's IVs, symbols and constants (using 'isBackwardsSlice').		// the other loop nest's IVs, symbols and constants (using 'isBackwardsSlice').
void mlir::getComputationSliceState(		void mlir::getComputationSliceState(
Operation depSourceOp, Operation depSinkOp,		Operation depSourceOp, Operation depSinkOp,
FlatAffineConstraints *dependenceConstraints, unsigned loopDepth,		FlatAffineConstraints *dependenceConstraints, unsigned loopDepth,
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	void mlir::getComputationSliceState(

llvm::SmallDenseSet<Value, 8> sequentialLoops;		llvm::SmallDenseSet<Value, 8> sequentialLoops;
if (isa<AffineReadOpInterface>(depSourceOp) &&		if (isa<AffineReadOpInterface>(depSourceOp) &&
isa<AffineReadOpInterface>(depSinkOp)) {		isa<AffineReadOpInterface>(depSinkOp)) {
// For read-read access pairs, clear any slice bounds on sequential loops.		// For read-read access pairs, clear any slice bounds on sequential loops.
// Get sequential loops in loop nest rooted at 'srcLoopIVs[0]'.		// Get sequential loops in loop nest rooted at 'srcLoopIVs[0]'.
getSequentialLoops(isBackwardSlice ? srcLoopIVs[0] : dstLoopIVs[0],		getSequentialLoops(isBackwardSlice ? srcLoopIVs[0] : dstLoopIVs[0],
&sequentialLoops);		&sequentialLoops);
}		}
// Clear all sliced loop bounds beginning at the first sequential loop, or
// first loop with a slice fusion barrier attribute..
// TODO: Use MemRef read/write regions instead of
// using 'kSliceFusionBarrierAttrName'.
auto getSliceLoop = [&](unsigned i) {		auto getSliceLoop = [&](unsigned i) {
		bondhugulaUnsubmitted Done Reply Inline Actions This is a stale comment - you may as well drop it in this revision since it's close to your changes. bondhugula: This is a stale comment - you may as well drop it in this revision since it's close to your…
return isBackwardSlice ? srcLoopIVs[i] : dstLoopIVs[i];		return isBackwardSlice ? srcLoopIVs[i] : dstLoopIVs[i];
};		};
		auto isInnermostInsertion = [&]() {
		return (isBackwardSlice ? loopDepth >= srcLoopIVs.size()
		: loopDepth >= dstLoopIVs.size());
		};
		vinayaka-polymageUnsubmitted Done Reply Inline Actions This can directly become a ternary operator ? vinayaka-polymage: This can directly become a ternary operator ?
		llvm::SmallDenseMap<Operation *, uint64_t, 8> sliceTripCountMap;
		auto srcIsUnitSlice = [&]() {
		return (buildSliceTripCountMap(*sliceState, &sliceTripCountMap) &&
		(getSliceIterationCount(sliceTripCountMap) == 1));
		};
		// Clear all sliced loop bounds beginning at the first sequential loop, or
		// first loop with a slice fusion barrier attribute..

for (unsigned i = 0; i < numSliceLoopIVs; ++i) {		for (unsigned i = 0; i < numSliceLoopIVs; ++i) {
Value iv = getSliceLoop(i).getInductionVar();		Value iv = getSliceLoop(i).getInductionVar();
if (sequentialLoops.count(iv) == 0 &&		if (sequentialLoops.count(iv) == 0 &&
getSliceLoop(i)->getAttr(kSliceFusionBarrierAttrName) == nullptr)		getSliceLoop(i)->getAttr(kSliceFusionBarrierAttrName) == nullptr)
continue;		continue;
		// Skip reset of bounds of reduction loop inserted in the destination loop
		// that meets the following conditions:
		// 1. Slice is single trip count.
		bondhugulaUnsubmitted Done Reply Inline Actions Typo in this clause near "... and have been ...". bondhugula: Typo in this clause near "... and have been ...".
		// 2. Loop bounds of the source and destination match.
		// 3. Is being inserted at the innermost insertion point.
		Optional<bool> isMaximal = sliceState->isMaximal();
		if (isLoopParallelAndContainsReduction(getSliceLoop(i)) &&
		isInnermostInsertion() && srcIsUnitSlice() && isMaximal.hasValue() &&
		isMaximal.getValue())
		vinayaka-polymageUnsubmitted Done Reply Inline Actions You can directly use `isMaximal` here and update the comment above. vinayaka-polymage: You can directly use `isMaximal` here and update the comment above.
		continue;
for (unsigned j = i; j < numSliceLoopIVs; ++j) {		for (unsigned j = i; j < numSliceLoopIVs; ++j) {
sliceState->lbs[j] = AffineMap();		sliceState->lbs[j] = AffineMap();
sliceState->ubs[j] = AffineMap();		sliceState->ubs[j] = AffineMap();
}		}
break;		break;
}		}
}		}

▲ Show 20 Lines • Show All 191 Lines • ▼ Show 20 Lines
Optional<int64_t> mlir::getMemoryFootprintBytes(AffineForOp forOp,		Optional<int64_t> mlir::getMemoryFootprintBytes(AffineForOp forOp,
int memorySpace) {		int memorySpace) {
auto *forInst = forOp.getOperation();		auto *forInst = forOp.getOperation();
return ::getMemoryFootprintBytes(		return ::getMemoryFootprintBytes(
*forInst->getBlock(), Block::iterator(forInst),		*forInst->getBlock(), Block::iterator(forInst),
std::next(Block::iterator(forInst)), memorySpace);		std::next(Block::iterator(forInst)), memorySpace);
}		}

		/// Returns whether a loop is parallel and contains a reduction loop.
		vinayaka-polymageUnsubmitted Not Done Reply Inline Actions Added a comment above. I might be confused here, but it seems like the function checks more that whether the loop is a reduction loop or not. vinayaka-polymage: Added a comment above. I might be confused here, but it seems like the function checks more…
		sumesh13AuthorUnsubmitted Done Reply Inline Actions Addressed the comment before. sumesh13: Addressed the comment before.
		vinayaka-polymageUnsubmitted Not Done Reply Inline Actions Minor: I think, the function checks that the loop `forOp` is parallel and also contains parallel reductions loops. Line 1348 checks that the outer loop is parallel. Can you please clarify? A sequential outer loop containing a reduction loop inside it will result in `false` when passed to this function, right ? vinayaka-polymage: Minor: I think, the function checks that the loop `forOp` is parallel and also contains…
		sumesh13AuthorUnsubmitted Done Reply Inline Actions Yes, the intent here was to say that the loop if not for the reduction is parallel and hence need not have its bounds reset. So a sequential outer loop with a reduction inner loop will result in false. Pls let me know if you see an opportunity to broaden the case. sumesh13: Yes, the intent here was to say that the loop if not for the reduction is parallel and hence…
		vinayaka-polymageUnsubmitted Not Done Reply Inline Actions OK, so in that case, function name and its description in comments will have to change. As this is a utility, anyone might use it in future for detecting reduction loops. If the check is a custom one, name and description will have to reflect that. Does this make sense ? vinayaka-polymage: OK, so in that case, function name and its description in comments will have to change. As this…
		sumesh13AuthorUnsubmitted Done Reply Inline Actions Sure..I think that makes sense. Any suggestion? How about isNonSequentialDueToReduction? sumesh13: Sure..I think that makes sense. Any suggestion? How about isNonSequentialDueToReduction?
		vinayaka-polymageUnsubmitted Done Reply Inline Actions My suggestion is `isLoopParallelAndContainsReductions`. vinayaka-polymage: My suggestion is `isLoopParallelAndContainsReductions`.
		bool mlir::isLoopParallelAndContainsReduction(AffineForOp forOp) {
		SmallVector<LoopReduction> reductions;
		if (!isLoopParallel(forOp, &reductions))
		return false;
		return !reductions.empty();
		bondhugulaUnsubmitted Done Reply Inline Actions `!reductions.empty()` bondhugula: `!reductions.empty()`
		bondhugulaUnsubmitted Done Reply Inline Actions No need of parentheses? bondhugula: No need of parentheses?
		}

/// Returns in 'sequentialLoops' all sequential loops in loop nest rooted		/// Returns in 'sequentialLoops' all sequential loops in loop nest rooted
/// at 'forOp'.		/// at 'forOp'.
void mlir::getSequentialLoops(AffineForOp forOp,		void mlir::getSequentialLoops(AffineForOp forOp,
llvm::SmallDenseSet<Value, 8> *sequentialLoops) {		llvm::SmallDenseSet<Value, 8> *sequentialLoops) {
forOp->walk([&](Operation *op) {		forOp->walk([&](Operation *op) {
if (auto innerFor = dyn_cast<AffineForOp>(op))		if (auto innerFor = dyn_cast<AffineForOp>(op))
if (!isLoopParallel(innerFor))		if (!isLoopParallel(innerFor))
sequentialLoops->insert(innerFor.getInductionVar());		sequentialLoops->insert(innerFor.getInductionVar());
Show All 14 Lines

mlir/lib/Dialect/Affine/IR/AffineOps.cpp

	Show First 20 Lines • Show All 1,869 Lines • ▼ Show 20 Lines
	void mlir::buildAffineLoopNest(			void mlir::buildAffineLoopNest(
	OpBuilder &builder, Location loc, ValueRange lbs, ValueRange ubs,			OpBuilder &builder, Location loc, ValueRange lbs, ValueRange ubs,
	ArrayRef<int64_t> steps,			ArrayRef<int64_t> steps,
	function_ref<void(OpBuilder &, Location, ValueRange)> bodyBuilderFn) {			function_ref<void(OpBuilder &, Location, ValueRange)> bodyBuilderFn) {
	buildAffineLoopNestImpl(builder, loc, lbs, ubs, steps, bodyBuilderFn,			buildAffineLoopNestImpl(builder, loc, lbs, ubs, steps, bodyBuilderFn,
	buildAffineLoopFromValues);			buildAffineLoopFromValues);
	}			}

				AffineForOp mlir::replaceForOpWithNewYields(OpBuilder &b, AffineForOp loop,
				bondhugulaUnsubmitted Done Reply Inline Actions Since this is exposed in `mlir::`, you'll need to name it something like `cloneForOpWithNewYields`. bondhugula: Since this is exposed in `mlir::`, you'll need to name it something like…
				sumesh13AuthorUnsubmitted Done Reply Inline Actions Thanks for pointing that out. I have renamed the function to now replaceForOpWithNewYields. sumesh13: Thanks for pointing that out. I have renamed the function to now replaceForOpWithNewYields.
				ValueRange newIterOperands,
				ValueRange newYieldedValues,
				ValueRange newIterArgs,
				bool replaceLoopResults) {
				assert(newIterOperands.size() == newYieldedValues.size() &&
				"newIterOperands must be of the same size as newYieldedValues");
				// Create a new loop before the existing one, with the extra operands.
				OpBuilder::InsertionGuard g(b);
				b.setInsertionPoint(loop);
				auto operands = llvm::to_vector<4>(loop.getIterOperands());
				operands.append(newIterOperands.begin(), newIterOperands.end());
				SmallVector<Value, 4> lbOperands(loop.getLowerBoundOperands());
				SmallVector<Value, 4> ubOperands(loop.getUpperBoundOperands());
				SmallVector<Value, 4> steps(loop.getStep());
				auto lbMap = loop.getLowerBoundMap();
				auto ubMap = loop.getUpperBoundMap();
				AffineForOp newLoop =
				b.create<AffineForOp>(loop.getLoc(), lbOperands, lbMap, ubOperands, ubMap,
				loop.getStep(), operands);
				// Take the body of the original parent loop.
				newLoop.getLoopBody().takeBody(loop.getLoopBody());
				for (Value val : newIterArgs)
				newLoop.getLoopBody().addArgument(val.getType());

				// Update yield operation with new values to be added.
				bondhugulaUnsubmitted Done Reply Inline Actions Nit: instruction -> operation bondhugula: Nit: instruction -> operation
				if (!newYieldedValues.empty()) {
				auto yield = cast<AffineYieldOp>(newLoop.getBody()->getTerminator());
				b.setInsertionPoint(yield);
				auto yieldOperands = llvm::to_vector<4>(yield.getOperands());
				yieldOperands.append(newYieldedValues.begin(), newYieldedValues.end());
				b.create<AffineYieldOp>(yield.getLoc(), yieldOperands);
				bondhugulaUnsubmitted Not Done Reply Inline Actions I think you don't need to materialize the yield operands into a SmallVector - the operand_range will convert to a `ValueRange` and the builder should take a ValueRange. bondhugula: I think you don't need to materialize the yield operands into a SmallVector - the operand_range…
				sumesh13AuthorUnsubmitted Done Reply Inline Actions Can you pls clarify? am also appending the operands of the yeild operation from the inner for loop. sumesh13: Can you pls clarify? am also appending the operands of the yeild operation from the inner for…
				bondhugulaUnsubmitted Done Reply Inline Actions I missed that - sorry. It should be fine then. bondhugula: I missed that - sorry. It should be fine then.
				yield.erase();
				}
				if (replaceLoopResults) {
				for (auto it : llvm::zip(loop.getResults(), newLoop.getResults().take_front(
				loop.getNumResults()))) {
				std::get<0>(it).replaceAllUsesWith(std::get<1>(it));
				}
				}
				loop.erase();
				return newLoop;
				}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// AffineIfOp			// AffineIfOp
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	namespace {			namespace {
				bondhugulaUnsubmitted Done Reply Inline Actions It's actually possible to clone the loop here without cloning the individual ops in the old loop's body. You can have the new loop just take the old loop's body - the old arguments just come along with the body. You need no remapping for the args nor does the IV need any remapping. The new block arguments can be appended and so there is no value remapping needed. With a move, the code becomes much shorter and this also saves a lot of reallocations for large loop bodies. bondhugula: It's actually possible to clone the loop here without cloning the individual ops in the old…
	/// Remove else blocks that have nothing other than a zero value yield.			/// Remove else blocks that have nothing other than a zero value yield.
	struct SimplifyDeadElse : public OpRewritePattern<AffineIfOp> {			struct SimplifyDeadElse : public OpRewritePattern<AffineIfOp> {
	using OpRewritePattern<AffineIfOp>::OpRewritePattern;			using OpRewritePattern<AffineIfOp>::OpRewritePattern;

	LogicalResult matchAndRewrite(AffineIfOp ifOp,			LogicalResult matchAndRewrite(AffineIfOp ifOp,
	PatternRewriter &rewriter) const override {			PatternRewriter &rewriter) const override {
	if (ifOp.elseRegion().empty() \|\|			if (ifOp.elseRegion().empty() \|\|
	!llvm::hasSingleElement(*ifOp.getElseBlock()) \|\| ifOp.getNumResults())			!llvm::hasSingleElement(*ifOp.getElseBlock()) \|\| ifOp.getNumResults())
	▲ Show 20 Lines • Show All 1,593 Lines • Show Last 20 Lines

mlir/lib/Transforms/LoopFusion.cpp

Show First 20 Lines • Show All 1,676 Lines • ▼ Show 20 Lines	while (!worklist.empty()) {
}		}
} while (dstNodeChanged);		} while (dstNodeChanged);
}		}
}		}

// Visits each node in the graph, and for each node, attempts to fuse it with		// Visits each node in the graph, and for each node, attempts to fuse it with
// its sibling nodes (nodes which share a parent, but no dependence edges).		// its sibling nodes (nodes which share a parent, but no dependence edges).
void fuseSiblingNodes() {		void fuseSiblingNodes() {
		LLVM_DEBUG(llvm::dbgs() << "--- Sibling Fusion ---\n");
init();		init();
while (!worklist.empty()) {		while (!worklist.empty()) {
unsigned dstId = worklist.back();		unsigned dstId = worklist.back();
worklist.pop_back();		worklist.pop_back();

// Skip if this node was removed (fused into another node).		// Skip if this node was removed (fused into another node).
if (mdg->nodes.count(dstId) == 0)		if (mdg->nodes.count(dstId) == 0)
continue;		continue;
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	while (findSiblingNodeToFuse(dstNode, &visitedSibNodeIds, &idAndMemref)) {
depthSliceUnions, maxLegalFusionDepth,		depthSliceUnions, maxLegalFusionDepth,
&bestDstLoopDepth, computeToleranceThreshold))		&bestDstLoopDepth, computeToleranceThreshold))
continue;		continue;
}		}

assert(bestDstLoopDepth > 0 && "Unexpected loop fusion depth");		assert(bestDstLoopDepth > 0 && "Unexpected loop fusion depth");
assert(!depthSliceUnions[bestDstLoopDepth - 1].isEmpty() &&		assert(!depthSliceUnions[bestDstLoopDepth - 1].isEmpty() &&
"Fusion depth has no computed slice union");		"Fusion depth has no computed slice union");
		// Check if source loop is being inserted in the innermost
		// destination loop. Based on this, the fused loop may be optimized
		// further inside `fuseLoops`.
		bool isInnermostInsertion = (bestDstLoopDepth == dstLoopDepthTest);
// Fuse computation slice of 'sibLoopNest' into 'dstLoopNest'.		// Fuse computation slice of 'sibLoopNest' into 'dstLoopNest'.
mlir::fuseLoops(sibAffineForOp, dstAffineForOp,		mlir::fuseLoops(sibAffineForOp, dstAffineForOp,
depthSliceUnions[bestDstLoopDepth - 1]);		depthSliceUnions[bestDstLoopDepth - 1],
		isInnermostInsertion);

		bondhugulaUnsubmitted Done Reply Inline Actions This change requires a comment. bondhugula: This change requires a comment.
auto dstForInst = cast<AffineForOp>(dstNode->op);		auto dstForInst = cast<AffineForOp>(dstNode->op);
// Update operation position of fused loop nest (if needed).		// Update operation position of fused loop nest (if needed).
if (insertPointInst != dstForInst.getOperation()) {		if (insertPointInst != dstForInst.getOperation()) {
dstForInst->moveBefore(insertPointInst);		dstForInst->moveBefore(insertPointInst);
}		}
// Update data dependence graph state post fusion.		// Update data dependence graph state post fusion.
updateStateAfterSiblingFusion(sibNode, dstNode);		updateStateAfterSiblingFusion(sibNode, dstNode);
}		}
▲ Show 20 Lines • Show All 181 Lines • Show Last 20 Lines

mlir/lib/Transforms/Utils/LoopFusionUtils.cpp

Show All 9 Lines
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Transforms/LoopFusionUtils.h"		#include "mlir/Transforms/LoopFusionUtils.h"

#include "mlir/Analysis/AffineAnalysis.h"		#include "mlir/Analysis/AffineAnalysis.h"
#include "mlir/Analysis/AffineStructures.h"		#include "mlir/Analysis/AffineStructures.h"
#include "mlir/Analysis/LoopAnalysis.h"		#include "mlir/Analysis/LoopAnalysis.h"
		#include "mlir/Analysis/SliceAnalysis.h"
#include "mlir/Analysis/Utils.h"		#include "mlir/Analysis/Utils.h"
#include "mlir/Dialect/Affine/IR/AffineOps.h"		#include "mlir/Dialect/Affine/IR/AffineOps.h"
#include "mlir/IR/AffineExpr.h"		#include "mlir/IR/AffineExpr.h"
#include "mlir/IR/AffineMap.h"		#include "mlir/IR/AffineMap.h"
#include "mlir/IR/BlockAndValueMapping.h"		#include "mlir/IR/BlockAndValueMapping.h"
#include "mlir/IR/Builders.h"		#include "mlir/IR/Builders.h"
#include "mlir/IR/BuiltinOps.h"		#include "mlir/IR/BuiltinOps.h"
#include "mlir/IR/Operation.h"		#include "mlir/IR/Operation.h"
▲ Show 20 Lines • Show All 332 Lines • ▼ Show 20 Lines	if (sliceComputationResult.value ==
SliceComputationResult::IncorrectSliceFailure) {		SliceComputationResult::IncorrectSliceFailure) {
LLVM_DEBUG(llvm::dbgs() << "Incorrect slice computation\n");		LLVM_DEBUG(llvm::dbgs() << "Incorrect slice computation\n");
return FusionResult::FailIncorrectSlice;		return FusionResult::FailIncorrectSlice;
}		}

return FusionResult::Success;		return FusionResult::Success;
}		}

		/// Patch the loop body of a forOp that is a single iteration reduction loop
		/// into its containing block.
		bondhugulaUnsubmitted Done Reply Inline Actions Reflow. bondhugula: Reflow.
		LogicalResult promoteSingleIterReductionLoop(AffineForOp forOp,
		vinayaka-polymageUnsubmitted Done Reply Inline Actions I think, such a merge of an inner loop should happen into its parent loop, when the bounds exactly match. Is this already being taken care of somewhere else ? vinayaka-polymage: I think, such a merge of an inner loop should happen into its parent loop, when the bounds…
		sumesh13AuthorUnsubmitted Done Reply Inline Actions The loop bounds are reset earlier if the bounds do not match. And in this case the merge is done only for unit slice. Do you still see it necessary to check the bounds? sumesh13: The loop bounds are reset earlier if the bounds do not match. And in this case the merge is…
		sumesh13AuthorUnsubmitted Done Reply Inline Actions I have added a clarification. sumesh13: I have added a clarification.
		bondhugulaUnsubmitted Done Reply Inline Actions Reflow to use the whole line - here and anywhere else. Please make a sweep as reviewers really can't comment everywhere. bondhugula: Reflow to use the whole line - here and anywhere else. Please make a sweep as reviewers really…
		bondhugulaUnsubmitted Done Reply Inline Actions This doesn't look really clean to have this method named and implemented this way with a generic single argument, and then having a comment and TODO stating this only should be used for sibling fusion. If we do this, we should have a boolean argument `siblingFusion` set to true/false and it should work for both cases. bondhugula: This doesn't look really clean to have this method named and implemented this way with a…
		bool siblingFusionUser) {
		bondhugulaUnsubmitted Done Reply Inline Actions Please return a `LogicalResult` (just like promoteIfSingleIteration). bondhugula: Please return a `LogicalResult` (just like promoteIfSingleIteration).
		// Check if the reduction loop is a single iteration loop.
		Optional<uint64_t> tripCount = getConstantTripCount(forOp);
		if (!tripCount \|\| tripCount.getValue() != 1)
		return failure();
		auto iterOperands = forOp.getIterOperands();
		auto *parentOp = forOp->getParentOp();
		if (!isa<AffineForOp>(parentOp))
		return failure();
		auto newOperands = forOp.getBody()->getTerminator()->getOperands();
		OpBuilder b(parentOp);
		// Replace the parent loop and add iteroperands and results from the `forOp`.
		AffineForOp parentForOp = forOp->getParentOfType<AffineForOp>();
		AffineForOp newLoop = replaceForOpWithNewYields(
		bondhugulaUnsubmitted Done Reply Inline Actions You don't need a `static_cast` here. bondhugula: You don't need a `static_cast` here.
		b, parentForOp, iterOperands, newOperands, forOp.getRegionIterArgs());

		// For sibling-fusion users, collect operations that use the results of the
		// `forOp` outside the new parent loop that has absorbed all its iter args
		// and operands. These operations will be moved later after the results
		// have been replaced.
		SetVector<Operation *> forwardSlice;
		bondhugulaUnsubmitted Done Reply Inline Actions Reflow - please use whole width. instruction -> operation bondhugula: Reflow - please use whole width. instruction -> operation
		if (siblingFusionUser) {
		for (unsigned i = 0, e = forOp.getNumResults(); i != e; ++i) {
		SetVector<Operation *> tmpForwardSlice;
		getForwardSlice(forOp.getResult(i), &tmpForwardSlice);
		forwardSlice.set_union(tmpForwardSlice);
		}
		}
		// Update the results of the `forOp` in the new loop.
		for (unsigned i = 0, e = forOp.getNumResults(); i != e; ++i) {
		forOp.getResult(i).replaceAllUsesWith(
		newLoop.getResult(i + parentOp->getNumResults()));
		vinayaka-polymageUnsubmitted Done Reply Inline Actions As ops are being moved outside, care must be taken to make sure that no dependences are being violated. In sibling fusion, it is OK, as there are read-read dependences. So, can you please clarify if such a promotion of loops should happen only in sibling fusions ? vinayaka-polymage: As ops are being moved outside, care must be taken to make sure that no dependences are being…
		sumesh13AuthorUnsubmitted Done Reply Inline Actions Yeah, I have renamed the option in the caller. sumesh13: Yeah, I have renamed the option in the caller.
		}
		bondhugulaUnsubmitted Done Reply Inline Actions We shouldn't be having such TODOs - we should address this via a proper function signature. bondhugula: We shouldn't be having such TODOs - we should address this via a proper function signature.
		// For sibling-fusion users, move operations that use the results of the
		// `forOp` outside the new parent loop
		if (siblingFusionUser) {
		topologicalSort(forwardSlice);
		for (Operation *op : llvm::reverse(forwardSlice))
		op->moveAfter(newLoop);
		}
		// Replace the induction variable.
		auto iv = forOp.getInductionVar();
		iv.replaceAllUsesWith(newLoop.getInductionVar());
		// Replace the iter args.
		auto forOpIterArgs = forOp.getRegionIterArgs();
		for (auto it : llvm::zip(forOpIterArgs, newLoop.getRegionIterArgs().take_back(
		forOpIterArgs.size()))) {
		bondhugulaUnsubmitted Not Done Reply Inline Actions Nit: Use `auto it` to avoid clang-tidy warning. bondhugula:* Nit: Use `auto *it` to avoid clang-tidy warning.
		sumesh13AuthorUnsubmitted Done Reply Inline Actions Didnt see a clang-tidy warning in this case. sumesh13: Didnt see a clang-tidy warning in this case.
		std::get<0>(it).replaceAllUsesWith(std::get<1>(it));
		}
		// Move the loop body operations, except for its terminator, to the loop's
		bondhugulaUnsubmitted Done Reply Inline Actions I don't think you need `getOperations()` - just `block.splice` may work. bondhugula: I don't think you need `getOperations()` - just `block.splice` may work.
		sumesh13AuthorUnsubmitted Done Reply Inline Actions I see there is no splice in block itself. sumesh13: I see there is no splice in block itself.
		// containing block.
		forOp.getBody()->back().erase();
		auto *parentBlock = forOp->getBlock();
		parentBlock->getOperations().splice(Block::iterator(forOp),
		forOp.getBody()->getOperations());
		forOp.erase();
		return success();
		}

/// Fuses 'srcForOp' into 'dstForOp' with destination loop block insertion point		/// Fuses 'srcForOp' into 'dstForOp' with destination loop block insertion point
/// and source slice loop bounds specified in 'srcSlice'.		/// and source slice loop bounds specified in 'srcSlice'.
void mlir::fuseLoops(AffineForOp srcForOp, AffineForOp dstForOp,		void mlir::fuseLoops(AffineForOp srcForOp, AffineForOp dstForOp,
const ComputationSliceState &srcSlice) {		const ComputationSliceState &srcSlice,
		bool isInnermostSiblingInsertion) {
		vinayaka-polymageUnsubmitted Done Reply Inline Actions Depending on the question above, if promotion of reducing loops should happen only in sibling fusion, the option here needs to be renamed. vinayaka-polymage: Depending on the question above, if promotion of reducing loops should happen only in sibling…
// Clone 'srcForOp' into 'dstForOp' at 'srcSlice->insertPoint'.		// Clone 'srcForOp' into 'dstForOp' at 'srcSlice->insertPoint'.
OpBuilder b(srcSlice.insertPoint->getBlock(), srcSlice.insertPoint);		OpBuilder b(srcSlice.insertPoint->getBlock(), srcSlice.insertPoint);
BlockAndValueMapping mapper;		BlockAndValueMapping mapper;
b.clone(*srcForOp, mapper);		b.clone(*srcForOp, mapper);

// Update 'sliceLoopNest' upper and lower bounds from computed 'srcSlice'.		// Update 'sliceLoopNest' upper and lower bounds from computed 'srcSlice'.
SmallVector<AffineForOp, 4> sliceLoops;		SmallVector<AffineForOp, 4> sliceLoops;
for (unsigned i = 0, e = srcSlice.ivs.size(); i < e; ++i) {		for (unsigned i = 0, e = srcSlice.ivs.size(); i < e; ++i) {
Show All 9 Lines	for (unsigned i = 0, e = srcSlice.ivs.size(); i < e; ++i) {
}		}
if (AffineMap ubMap = srcSlice.ubs[i]) {		if (AffineMap ubMap = srcSlice.ubs[i]) {
auto ubOperands = srcSlice.ubOperands[i];		auto ubOperands = srcSlice.ubOperands[i];
canonicalizeMapAndOperands(&ubMap, &ubOperands);		canonicalizeMapAndOperands(&ubMap, &ubOperands);
forOp.setUpperBound(ubOperands, ubMap);		forOp.setUpperBound(ubOperands, ubMap);
}		}
}		}

		llvm::SmallDenseMap<Operation *, uint64_t, 8> sliceTripCountMap;
		auto srcIsUnitSlice = [&]() {
		return (buildSliceTripCountMap(srcSlice, &sliceTripCountMap) &&
		(getSliceIterationCount(sliceTripCountMap) == 1));
		};
		// Fix up and if possible, eliminate single iteration loops.
		for (AffineForOp forOp : sliceLoops) {
		if (isLoopParallelAndContainsReduction(forOp) &&
		isInnermostSiblingInsertion && srcIsUnitSlice())
		// Patch reduction loop - only ones that are sibling-fused with the
		bondhugulaUnsubmitted Done Reply Inline Actions Drop blank line. bondhugula: Drop blank line.
		// destination loop - into the parent loop.
		(void)promoteSingleIterReductionLoop(forOp, true);
		else
// Promote any single iteration slice loops.		// Promote any single iteration slice loops.
for (AffineForOp forOp : sliceLoops)
(void)promoteIfSingleIteration(forOp);		(void)promoteIfSingleIteration(forOp);
}		}
		}

/// Collect loop nest statistics (eg. loop trip count and operation count)		/// Collect loop nest statistics (eg. loop trip count and operation count)
/// in 'stats' for loop nest rooted at 'forOp'. Returns true on success,		/// in 'stats' for loop nest rooted at 'forOp'. Returns true on success,
/// returns false otherwise.		/// returns false otherwise.
bool mlir::getLoopNestStats(AffineForOp forOpRoot, LoopNestStats *stats) {		bool mlir::getLoopNestStats(AffineForOp forOpRoot, LoopNestStats *stats) {
auto walkResult = forOpRoot.walk([&](AffineForOp forOp) {		auto walkResult = forOpRoot.walk([&](AffineForOp forOp) {
auto *childForOp = forOp.getOperation();		auto *childForOp = forOp.getOperation();
auto *parentForOp = forOp->getParentOp();		auto *parentForOp = forOp->getParentOp();
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	if (tripCountOverrideMap != nullptr) {
if (it != tripCountOverrideMap->end()) {		if (it != tripCountOverrideMap->end()) {
tripCount = it->second;		tripCount = it->second;
}		}
}		}
// Returns the total number of dynamic instances of operations in loop body.		// Returns the total number of dynamic instances of operations in loop body.
return tripCount * opCount;		return tripCount * opCount;
}		}

// TODO: extend this to handle multiple result maps.
static Optional<uint64_t> getConstDifference(AffineMap lbMap, AffineMap ubMap) {
assert(lbMap.getNumResults() == 1 && "expected single result bound map");
assert(ubMap.getNumResults() == 1 && "expected single result bound map");
assert(lbMap.getNumDims() == ubMap.getNumDims());
assert(lbMap.getNumSymbols() == ubMap.getNumSymbols());
AffineExpr lbExpr(lbMap.getResult(0));
AffineExpr ubExpr(ubMap.getResult(0));
auto loopSpanExpr = simplifyAffineExpr(ubExpr - lbExpr, lbMap.getNumDims(),
lbMap.getNumSymbols());
auto cExpr = loopSpanExpr.dyn_cast<AffineConstantExpr>();
if (!cExpr)
return None;
return cExpr.getValue();
}

// Return the number of iterations in the given slice.
static uint64_t getSliceIterationCount(
const llvm::SmallDenseMap<Operation *, uint64_t, 8> &sliceTripCountMap) {
uint64_t iterCount = 1;
for (const auto &count : sliceTripCountMap) {
iterCount *= count.second;
}
return iterCount;
}

// Builds a map 'tripCountMap' from AffineForOp to constant trip count for loop
// nest surrounding represented by slice loop bounds in 'slice'.
// Returns true on success, false otherwise (if a non-constant trip count
// was encountered).
// TODO: Make this work with non-unit step loops.
static bool buildSliceTripCountMap(
const ComputationSliceState &slice,
llvm::SmallDenseMap<Operation , uint64_t, 8> tripCountMap) {
unsigned numSrcLoopIVs = slice.ivs.size();
// Populate map from AffineForOp -> trip count
for (unsigned i = 0; i < numSrcLoopIVs; ++i) {
AffineForOp forOp = getForInductionVarOwner(slice.ivs[i]);
auto *op = forOp.getOperation();
AffineMap lbMap = slice.lbs[i];
AffineMap ubMap = slice.ubs[i];
// If lower or upper bound maps are null or provide no results, it implies
// that source loop was not at all sliced, and the entire loop will be a
// part of the slice.
if (!lbMap \|\| lbMap.getNumResults() == 0 \|\| !ubMap \|\|
ubMap.getNumResults() == 0) {
// The iteration of src loop IV 'i' was not sliced. Use full loop bounds.
if (forOp.hasConstantLowerBound() && forOp.hasConstantUpperBound()) {
(*tripCountMap)[op] =
forOp.getConstantUpperBound() - forOp.getConstantLowerBound();
continue;
}
Optional<uint64_t> maybeConstTripCount = getConstantTripCount(forOp);
if (maybeConstTripCount.hasValue()) {
(*tripCountMap)[op] = maybeConstTripCount.getValue();
continue;
}
return false;
}
Optional<uint64_t> tripCount = getConstDifference(lbMap, ubMap);
// Slice bounds are created with a constant ub - lb difference.
if (!tripCount.hasValue())
return false;
(*tripCountMap)[op] = tripCount.getValue();
}
return true;
}

/// Computes the total cost of the loop nest rooted at 'forOp' using 'stats'.		/// Computes the total cost of the loop nest rooted at 'forOp' using 'stats'.
/// Currently, the total cost is computed by counting the total operation		/// Currently, the total cost is computed by counting the total operation
/// instance count (i.e. total number of operations in the loop body * loop		/// instance count (i.e. total number of operations in the loop body * loop
/// trip count) for the entire loop nest.		/// trip count) for the entire loop nest.
int64_t mlir::getComputeCost(AffineForOp forOp, LoopNestStats &stats) {		int64_t mlir::getComputeCost(AffineForOp forOp, LoopNestStats &stats) {
return getComputeCostHelper(forOp.getOperation(), stats,		return getComputeCostHelper(forOp.getOperation(), stats,
/tripCountOverrideMap=/nullptr,		/tripCountOverrideMap=/nullptr,
/computeCostMap=/nullptr);		/computeCostMap=/nullptr);
▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

mlir/test/Transforms/loop-fusion.mlir

	Show First 20 Lines • Show All 3,144 Lines • ▼ Show 20 Lines
	// CHECK-NEXT: affine.store			// CHECK-NEXT: affine.store
	// CHECK: affine.for			// CHECK: affine.for
	// CHECK-NEXT: affine.load			// CHECK-NEXT: affine.load
	// CHECK-NEXT: mulf			// CHECK-NEXT: mulf
	// CHECK-NEXT: affine.store			// CHECK-NEXT: affine.store

	// -----			// -----

				// MAXIMAL-LABEL: func @reduce_add_f32_f32(
				bondhugulaUnsubmitted Not Done Reply Inline Actions This whole test case has become too long (3144 lines) and we should really split this file - perhaps write after this PR in an NFC PR. bondhugula: This whole test case has become too long (3144 lines) and we should really split this file…
				sumesh13AuthorUnsubmitted Done Reply Inline Actions Sure...I can move the purely sibling fusion related tests to a new file. sumesh13: Sure...I can move the purely sibling fusion related tests to a new file.
				func @reduce_add_f32_f32(%arg0: memref<64x64xf32, 1>, %arg1: memref<1x64xf32, 1>, %arg2: memref<1x64xf32, 1>) {
				%cst_0 = constant 0.000000e+00 : f32
				%cst_1 = constant 1.000000e+00 : f32
				%0 = memref.alloca() : memref<f32, 1>
				%1 = memref.alloca() : memref<f32, 1>
				affine.for %arg3 = 0 to 1 {
				affine.for %arg4 = 0 to 64 {
				%accum = affine.for %arg5 = 0 to 64 iter_args (%prevAccum = %cst_0) -> f32 {
				%4 = affine.load %arg0[%arg5, %arg4] : memref<64x64xf32, 1>
				%5 = addf %prevAccum, %4 : f32
				affine.yield %5 : f32
				}
				%accum_dbl = addf %accum, %accum : f32
				affine.store %accum_dbl, %arg1[%arg3, %arg4] : memref<1x64xf32, 1>
				}
				}
				affine.for %arg3 = 0 to 1 {
				affine.for %arg4 = 0 to 64 {
				%accum = affine.for %arg5 = 0 to 64 iter_args (%prevAccum = %cst_1) -> f32 {
				%4 = affine.load %arg0[%arg5, %arg4] : memref<64x64xf32, 1>
				%5 = mulf %prevAccum, %4 : f32
				affine.yield %5 : f32
				}
				%accum_sqr = mulf %accum, %accum : f32
				affine.store %accum_sqr, %arg2[%arg3, %arg4] : memref<1x64xf32, 1>
				}
				}
				return
				}
				// The two loops here get maximally sibling-fused at the innermost
				// insertion point. Test checks if the innermost reduction loop of the fused loop
				// gets promoted into its outerloop.
				// MAXIMAL-SAME: %[[arg_0:.*]]: memref<64x64xf32, 1>,
				// MAXIMAL-SAME: %[[arg_1:.*]]: memref<1x64xf32, 1>,
				// MAXIMAL-SAME: %[[arg_2:.*]]: memref<1x64xf32, 1>) {
				// MAXIMAL: %[[cst:.*]] = constant 0 : index
				// MAXIMAL-NEXT: %[[cst_0:.*]] = constant 0.000000e+00 : f32
				// MAXIMAL-NEXT: %[[cst_1:.*]] = constant 1.000000e+00 : f32
				// MAXIMAL: affine.for %[[idx_0:.*]] = 0 to 1 {
				// MAXIMAL-NEXT: affine.for %[[idx_1:.*]] = 0 to 64 {
				// MAXIMAL-NEXT: %[[results:.]]:2 = affine.for %[[idx_2:.]] = 0 to 64 iter_args(%[[iter_0:.]] = %[[cst_1]], %[[iter_1:.]] = %[[cst_0]]) -> (f32, f32) {
				// MAXIMAL-NEXT: %[[val_0:.*]] = affine.load %[[arg_0]][%[[idx_2]], %[[idx_1]]] : memref<64x64xf32, 1>
				// MAXIMAL-NEXT: %[[reduc_0:.*]] = addf %[[iter_1]], %[[val_0]] : f32
				// MAXIMAL-NEXT: %[[val_1:.*]] = affine.load %[[arg_0]][%[[idx_2]], %[[idx_1]]] : memref<64x64xf32, 1>
				// MAXIMAL-NEXT: %[[reduc_1:.*]] = mulf %[[iter_0]], %[[val_1]] : f32
				// MAXIMAL-NEXT: affine.yield %[[reduc_1]], %[[reduc_0]] : f32, f32
				// MAXIMAL-NEXT: }
				// MAXIMAL-NEXT: %[[reduc_0_dbl:.]] = addf %[[results:.]]#1, %[[results]]#1 : f32
				// MAXIMAL-NEXT: affine.store %[[reduc_0_dbl]], %[[arg_1]][%[[cst]], %[[idx_1]]] : memref<1x64xf32, 1>
				// MAXIMAL-NEXT: %[[reduc_1_sqr:.*]] = mulf %[[results]]#0, %[[results]]#0 : f32
				// MAXIMAL-NEXT: affine.store %[[reduc_1_sqr]], %[[arg_2]][%[[idx_0]], %[[idx_1]]] : memref<1x64xf32, 1>
				// MAXIMAL-NEXT: }
				// MAXIMAL-NEXT: }
				// MAXIMAL-NEXT: return
				// MAXIMAL-NEXT: }

				// -----

				// CHECK-LABEL: func @reduce_add_non_innermost
				func @reduce_add_non_innermost(%arg0: memref<64x64xf32, 1>, %arg1: memref<1x64xf32, 1>, %arg2: memref<1x64xf32, 1>) {
				%cst = constant 0.000000e+00 : f32
				%cst_0 = constant 1.000000e+00 : f32
				%0 = memref.alloca() : memref<f32, 1>
				%1 = memref.alloca() : memref<f32, 1>
				affine.for %arg3 = 0 to 1 {
				affine.for %arg4 = 0 to 64 {
				%accum = affine.for %arg5 = 0 to 64 iter_args (%prevAccum = %cst) -> f32 {
				%4 = affine.load %arg0[%arg5, %arg4] : memref<64x64xf32, 1>
				%5 = addf %prevAccum, %4 : f32
				affine.yield %5 : f32
				}
				%accum_dbl = addf %accum, %accum : f32
				affine.store %accum_dbl, %arg1[%arg3, %arg4] : memref<1x64xf32, 1>
				}
				}
				affine.for %arg3 = 0 to 1 {
				affine.for %arg4 = 0 to 64 {
				%accum = affine.for %arg5 = 0 to 64 iter_args (%prevAccum = %cst_0) -> f32 {
				%4 = affine.load %arg0[%arg5, %arg4] : memref<64x64xf32, 1>
				%5 = mulf %prevAccum, %4 : f32
				affine.yield %5 : f32
				}
				%accum_sqr = mulf %accum, %accum : f32
				affine.store %accum_sqr, %arg2[%arg3, %arg4] : memref<1x64xf32, 1>
				}
				}
				return
				}
				// Test checks the loop structure is preserved after sibling fusion.
				// CHECK: affine.for
				// CHECK-NEXT: affine.for
				// CHECK-NEXT: affine.for
				// CHECK affine.for

				// -----
				func @reduce_add_non_maximal_f32_f32(%arg0: memref<64x64xf32, 1>, %arg1 : memref<1x64xf32, 1>, %arg2 : memref<1x64xf32, 1>) {
				%cst_0 = constant 0.000000e+00 : f32
				%cst_1 = constant 1.000000e+00 : f32
				affine.for %arg3 = 0 to 1 {
				affine.for %arg4 = 0 to 64 {
				%accum = affine.for %arg5 = 0 to 64 iter_args (%prevAccum = %cst_0) -> f32 {
				%4 = affine.load %arg0[%arg5, %arg4] : memref<64x64xf32, 1>
				%5 = addf %prevAccum, %4 : f32
				affine.yield %5 : f32
				}
				%accum_dbl = addf %accum, %accum : f32
				affine.store %accum_dbl, %arg1[%arg3, %arg4] : memref<1x64xf32, 1>
				}
				}
				affine.for %arg3 = 0 to 1 {
				affine.for %arg4 = 0 to 64 {
				// Following loop trip count does not match the corresponding source trip count.
				bondhugulaUnsubmitted Done Reply Inline Actions This trailing comment isn't clear. Please use a separate line and rephrase a bit. bondhugula: This trailing comment isn't clear. Please use a separate line and rephrase a bit.
				%accum = affine.for %arg5 = 0 to 32 iter_args (%prevAccum = %cst_1) -> f32 {
				%4 = affine.load %arg0[%arg5, %arg4] : memref<64x64xf32, 1>
				%5 = mulf %prevAccum, %4 : f32
				affine.yield %5 : f32
				}
				%accum_sqr = mulf %accum, %accum : f32
				affine.store %accum_sqr, %arg2[%arg3, %arg4] : memref<1x64xf32, 1>
				}
				}
				return
				}
				// Test checks the loop structure is preserved after sibling fusion
				// since the destination loop and source loop trip counts do not
				// match.
				// MAXIMAL-LABEL: func @reduce_add_non_maximal_f32_f32(
				// MAXIMAL: %[[cst_0:.*]] = constant 0.000000e+00 : f32
				// MAXIMAL-NEXT: %[[cst_1:.*]] = constant 1.000000e+00 : f32
				// MAXIMAL-NEXT: affine.for %[[idx_0:.*]]= 0 to 1 {
				// MAXIMAL-NEXT: affine.for %[[idx_1:.*]] = 0 to 64 {
				// MAXIMAL-NEXT: %[[result_1:.]] = affine.for %[[idx_2:.]] = 0 to 32 iter_args(%[[iter_0:.*]] = %[[cst_1]]) -> (f32) {
				// MAXIMAL-NEXT: %[[result_0:.]] = affine.for %[[idx_3:.]] = 0 to 64 iter_args(%[[iter_1:.*]] = %[[cst_0]]) -> (f32) {

				// -----

	// CHECK-LABEL: func @fuse_large_number_of_loops			// CHECK-LABEL: func @fuse_large_number_of_loops
	func @fuse_large_number_of_loops(%arg0: memref<20x10xf32, 1>, %arg1: memref<20x10xf32, 1>, %arg2: memref<20x10xf32, 1>, %arg3: memref<20x10xf32, 1>, %arg4: memref<20x10xf32, 1>, %arg5: memref<f32, 1>, %arg6: memref<f32, 1>, %arg7: memref<f32, 1>, %arg8: memref<f32, 1>, %arg9: memref<20x10xf32, 1>, %arg10: memref<20x10xf32, 1>, %arg11: memref<20x10xf32, 1>, %arg12: memref<20x10xf32, 1>) {			func @fuse_large_number_of_loops(%arg0: memref<20x10xf32, 1>, %arg1: memref<20x10xf32, 1>, %arg2: memref<20x10xf32, 1>, %arg3: memref<20x10xf32, 1>, %arg4: memref<20x10xf32, 1>, %arg5: memref<f32, 1>, %arg6: memref<f32, 1>, %arg7: memref<f32, 1>, %arg8: memref<f32, 1>, %arg9: memref<20x10xf32, 1>, %arg10: memref<20x10xf32, 1>, %arg11: memref<20x10xf32, 1>, %arg12: memref<20x10xf32, 1>) {
	%cst = constant 1.000000e+00 : f32			%cst = constant 1.000000e+00 : f32
	%0 = memref.alloc() : memref<f32, 1>			%0 = memref.alloc() : memref<f32, 1>
	affine.store %cst, %0[] : memref<f32, 1>			affine.store %cst, %0[] : memref<f32, 1>
	%1 = memref.alloc() : memref<20x10xf32, 1>			%1 = memref.alloc() : memref<20x10xf32, 1>
	affine.for %arg13 = 0 to 20 {			affine.for %arg13 = 0 to 20 {
	affine.for %arg14 = 0 to 10 {			affine.for %arg14 = 0 to 10 {
	▲ Show 20 Lines • Show All 172 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Enable cleanup of single iteration reduction loops being sibling-fused maximally ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 358688

mlir/include/mlir/Analysis/Utils.h

mlir/include/mlir/Dialect/Affine/IR/AffineOps.h

mlir/include/mlir/Transforms/LoopFusionUtils.h

mlir/lib/Analysis/Utils.cpp

mlir/lib/Dialect/Affine/IR/AffineOps.cpp

mlir/lib/Transforms/LoopFusion.cpp

mlir/lib/Transforms/Utils/LoopFusionUtils.cpp

mlir/test/Transforms/loop-fusion.mlir

[mlir] Enable cleanup of single iteration reduction loops being sibling-fused maximally
ClosedPublic