This is an archive of the discontinued LLVM Phabricator instance.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1879–1897	This block is quite difficult to parse. I would recommend factoring it out into a helper or reworking to avoid multiple nested if/else statements. This could be as simple as getting your three block operands. Checking for casts. Seeing if there is a mul between input/kernel, then finding the combiner. with the accumulator. Then special checks (e.g. finding the combiner) can be done by helpers to avoid nesting the list inside of the entire block. It would provide a more step-by-step process and avoid parsing deep parts of the implementation.
1940–1941	There appears to be a decent amount of shared implementation with the convolution implementation. I would recommend maintaining shared lines for shared implementation.
1970–1971	Ditto about the shared implementation.
2066–2077	Often if you have an if-else block in a simple loop (like here) it is simpler to do for (...) { if (condition) { do work; continue; } do other work; } This avoids the extra else statement and demonstrates there is only one task.
2390–2393	Why are these being handled as globals vs being included a parameters to the builder?

This revision now requires changes to proceed.Dec 16 2022, 11:13 AM

vmurali added inline comments.Dec 16 2022, 11:45 AM

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
2390–2393	The code works in two steps - the constructor checks for validity of the op and sets these values, and the transformer (`conv`) performs the transformation post construction. To keep the state between the construction and transformation, I need these values. (They are inside a class, so not really global.)

vmurali added inline comments.Dec 16 2022, 11:51 AM

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
2390–2393	Another thing to note is that we never needed to specify the cast op or the reduction op for convolution because (a) convolution was always a sum of products, and (b) contraction automatically took care of casting. But for pooling, neither of these is true (as we have multiple kinds of pooling, and we don't use contraction for the equivalent of reduction). Finally, the `isPool` state is simply to distinguish between pooling and convolution. The only state that can potentially be eliminated is `isPoolExt` and rely on null strings for no reduction - I find that distasteful.

vmurali updated this revision to Diff 483628.Dec 16 2022, 12:06 PM

vmurali marked 3 inline comments as done.

vmurali marked an inline comment as done.

Harbormaster completed remote builds in B203687: Diff 483628.Dec 16 2022, 1:17 PM

Addressed Rob's comments

Addressed all of Rob's comments

Clang format

Harbormaster completed remote builds in B203708: Diff 483656.Dec 16 2022, 2:57 PM

It looks good! Doing a first pass.

Design question: Wondering if it would make more sense to decouple conv and pooling implementation while sharing the common code. The fact that we have to preserve the isPool state and add conditional code all over the place is suggesting that something is off. Not sure how feasible this is but it would be good to give it a thought.

A couple of style comments:

Single statement ifs and first shouldn't have { }.
Period at the end of a comment.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1825	Nit: If they are not widely used, I would write these predicates in place to prevent an explosion of utilities with different combinations. If you need a local predicate, you can write a local lambda. For example, since `!isa<arith::MulIOp, arith::MulFOp>(op)` is only used once, I would write it as: if (!isa<arith::MulIOp, arith::MulFOp>(op) && llvm::any_of(op->getOperands(), ...))
1878–1887	Is it possible to simplify this code? If I parse this correctly, the loop body beyond the `isa<BlockArgument` check will be executed only once as `feedOp = getDefiningOp()` will never be null. I would perhaps add a few simpler checks using `llvm::count_if` to check that there is only one operand that is not a block argument, then `llvm::find_if` to get it, and move the remaining code out of the loop. Just a quick suggestion, there must be other ways. Perhaps is also a good idea to separate the two `If` described in the comments if that simplifies the code. I would prioritize simplicity over performance as the number of operands is very small.
1879–1897	what about something like if (!maybeKind \|\| !isSupportedConvKind(isPool)) return;
1899	nit: `!(A == B && C == D)` -> `A != B \|\| C !=D` ? It seems easier to parse.

Thanks for the review, I will address some of it and submit again for review.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1825	Rob was suggesting that I create specific functions for these. I can replace the for-loop with `llvm::all_of`. Maybe I can inline the multiply part alone after switching to llvm::all_of
1878–1887	I just modified what existed earlier - look at `mulOp` in the earlier code. But yes, I am just checking if there's at most one non-BlockArgument, and if I find it, I make sure it's either a MulOp of block arguments or casts of block arguments (convolution), or a cast of a block argument (pooling).
1879–1897	I think there are other CombiningKinds that are not used in Convolution or Pooling. We should have an explicit list for convolution and pooling and not rely on not-convolution being pooling.
1899	For me, `!(A && B)` is a lot more readable because I enumerate the valid conditions in my head and simply negate it for invalid.

Addressing Diego's comments

vmurali marked 4 inline comments as done.Dec 16 2022, 8:25 PM

Harbormaster completed remote builds in B203748: Diff 483716.Dec 16 2022, 8:54 PM

hanchung added a reviewer: antiagainst.Dec 19 2022, 2:46 PM

In D140188#4002828, @dcaballe wrote:

It looks good! Doing a first pass.

Design question: Wondering if it would make more sense to decouple conv and pooling implementation while sharing the common code. The fact that we have to preserve the isPool state and add conditional code all over the place is suggesting that something is off. Not sure how feasible this is but it would be good to give it a thought.

A couple of style comments:

Single statement ifs and first shouldn't have { }.

Period at the end of a comment.

Big +1. We should decouple conv and pooling implementation. We should also factor common codes out and avoid having isPool everywhere?

Also, can we use an enum instead having isPool and isPoolExt? Using enum is much simpler because the boolean combination can be exponentially large and we'll have lacking documentations about the combinations.

(I did not look into too much implementation because the design would change.)

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1801–1825	llvm style nit: Don’t Use Braces on Simple Single-Statement Bodies of if/else/loop Statements https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements
1802	A CastOpInterface is allowed to have multiple inputs. However, we unconditionally check the first operand. That should be taken into account.
1816–1822	[optional] I think using `case switch` captures this better. That's also what we used in VectorOps.cpp. E.g., static bool isSupportedCombiningKind(CombiningKind combiningKind, Type elementType) { switch (combiningKind) { case CombiningKind::ADD: case CombiningKind::MUL: return elementType.isIntOrIndexOrFloat(); case CombiningKind::MINUI: case CombiningKind::MINSI: case CombiningKind::MAXUI: case CombiningKind::MAXSI: case CombiningKind::AND: case CombiningKind::OR: case CombiningKind::XOR: return elementType.isIntOrIndex(); case CombiningKind::MINF: case CombiningKind::MAXF: return elementType.isa<FloatType>(); } return false; }
1861–1864	should we just check if the op is a LinalgConvolutionOpInterface? Maybe @antiagainst should weigh in because he's the author of the implementation.
1899	I think we should follow the pattern that most of people use in MLIR/LLVM. My experience tells me that I've seen `A != B \|\| C != D` form a lot in the code browsing and code review. I seldom see `!(A && B)` form in the codebase.

In D140188#4006379, @hanchung wrote:

In D140188#4002828, @dcaballe wrote:

It looks good! Doing a first pass.

Design question: Wondering if it would make more sense to decouple conv and pooling implementation while sharing the common code. The fact that we have to preserve the isPool state and add conditional code all over the place is suggesting that something is off. Not sure how feasible this is but it would be good to give it a thought.

A couple of style comments:

Single statement ifs and first shouldn't have { }.

Period at the end of a comment.

Big +1. We should decouple conv and pooling implementation. We should also factor common codes out and avoid having isPool everywhere?

Also, can we use an enum instead having isPool and isPoolExt? Using enum is much simpler because the boolean combination can be exponentially large and we'll have lacking documentations about the combinations.

(I did not look into too much implementation because the design would change.)

There's contradictory feedback here about separating out pooling (the earlier review, for instance). There are only a few places where isPool is checked (8 places after the constructor, 4 of which can be removed without affecting functionality, so effectively just 4 places) and the amount of shared code is big. When we had brainstormed the design, we explicitly decided to not create a new transformation for pooling. Pooling ops implement the convolution interface, precisely to combine it with convolution.

Just to reiterate, the only place where a distinction exists is in binding the kernel dimensions and in creating the final reduction op.

Also, can we use an enum instead having isPool and isPoolExt? Using enum is much simpler because the boolean combination can be exponentially large and we'll have lacking documentations about the combinations.

isPool and isPoolExt are logically almost independent. Yes you will remove one state because it cannot be isPoolExt without it being isPool. The hackier solution is to use the size of the poolExtOp to decide if extension has to be applied

vmurali marked an inline comment as done.Dec 19 2022, 3:50 PM

vmurali added inline comments.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1802	Aah then we have to explicitly make sure there's only one operand. I will change that
1861–1864	I believe this check has already been performed
1899	Okay maybe I am weird in thinking about conditions this way 🤣. But, once we have more complicated conditions like what we have here, having an OR of negations is not enough. One has to negate complicated expressions leading to some complex expression containing ANDs of some negations and some literals as is, all ORed together. At that point it's just impossible to reason about. Instead, just listing all the valid options as OR, with the valid options containing a bunch of ANDed expressions for that option

I did not mean to create a new transformation for pooling. I'd suggest that we should consider going with enum and refactor things out. After review more details, I still feel that it's better for having enum. It helps people like me start thinking which part can be refactored out or be organized together. It also hints others which part should be modified if there are more new kinds of LinalgConvolutionOpInterface op.

Anyway, I'm not marking it "require to be fixed". It's more like wanna making sure it has been considered. (Because I do not the see the contradictory feedback and the discussions in the review thread.)

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1861–1864	I don't know if the check has been performed or not. I look into the struct members and I can only say that it is a LinalgOp. A LinalgOp does not have to be LinalgConvolutionOpInterface. But now I figure out why we check this. It is because the generator works for Conv1D. And now we are updating the Conv1D definition to consider Pool1D cases. In this context, should we add back the check about `resShapedType.getRank() != 3`? (IMO, we should have LinalgConvolutionOpInterface::is1D method and move the check to there. But I think it should not be a blocker of the pooling vectorization patch.)
1877	having the check is better than comment it.. And we can use `switch-case` for this kind of code.
1899	I can point out the if-cond that I think can be the way; mark them an optional comment..
1905–1907	we can just update the error message to `conv/pool`.. The notification already includes the op. People can look into the op name and figure out what's happening.
2017	Here is an example that having enum is better. We can have better documentation and write code like // The reduce op of a convolution op is a binary op. if (enum == ConvEnum) rhs = ... It's more meaningful to me and consider the cases if there are more kinds of LinalgConvolutionOpInterface ops.
2042–2048	I believe the compiler is smart for handling it, but can we just swap `for if` to `if for`? :)

(Because I do not the see the contradictory feedback and the discussions in the review thread.)

I meant the feedback about keeping more code common than separate.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1861–1864	`resShapedType.getRank() != 3` is performed after I figure out it's convolution and not pooling (line 1918). I suppose I can refactor it in and make the check earlier, since both pooling and convolution have rank 3.
2017	Aah I didn't realize you meant Conv vs Pool enum. I assumed you meant an enum combining `isPool` and `isPoolExt`, which seems unnecessary.

hanchung added inline comments.Dec 20 2022, 10:51 AM

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
2017	I meant an enum which may have `conv, pool, poolext` or just `conv, pool`.

Addressing all of Hanhan's comments

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
2017	`poolExt` is a subset of `pool`, just to see if there's a cast during the pooling operation. And it's used in exactly one place, so it shouldn't be conflated with `conv` and `pool`

vmurali marked an inline comment as done.Dec 20 2022, 2:40 PM

Clang format

Harbormaster completed remote builds in B204246: Diff 484387.Dec 20 2022, 3:41 PM

Thanks for addressing the feedback! I did another pass.

There's contradictory feedback here about separating out pooling (the earlier review, for instance). There are only a few places where isPool is checked (8 places after the constructor, 4 of which can be removed without affecting functionality, so effectively just 4 places) and the amount of shared code is big. When we had brainstormed the design, we explicitly decided to not create a new transformation for pooling. Pooling ops implement the convolution interface, precisely to combine it with convolution.

I think @hanchung's feedback is similar to mine. We both feel that something is off but we haven't written the original code so we can't make a specific suggestion without spending much more time on the code. The fact that we have code that conditionally depends on the Conv or Pool enum values indicates that we are conflating disjoint implementations into a single one and that the design would benefit from using hierarchy or overloading to separate the Conv or Pool specific code from the shared main algorithm. I'm ok if this is not addressed as part of this patch but it's very likely that more specific code is added in the near future as we add support for more Conv and Pool cases so it would be better to do some refactoring now that the implementation is simpler. If you think that having a separate transformation for pooling, with common utilities shared with conv, should be reconsider... that should be fine! I personally can't suggest anything more specific without diving more into the code and the structured generator infra.

Hopefully that helps!

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1825	Sorry, I missed Rob's comment. In general, utility functions make sense. However, for these specific and simple checks that are used all over the compiler there are some trade-offs that I would consider. If the check is used only once, it's probably simpler to parse the check directly (developers are very used to these kind of `isa` checks) than perhaps a complex name (e.g., `isBlockArgumentOrCastOfBlockArgument`). If you still feel that giving this a name is more convenient for a single use, a local named lambda might work better. The other problem I see with adding static utilities for these checks is that there could easily be a combinatorial explosion of them and, most importantly, we may create a style or API for these checks local to this specific file, which wouldn't be aligned with the rest of the compiler and it would probably be used inconsistently by those not aware of it. Anyways, too much discussion for this subjective nit comment :). I hope with all this information in place you can make a good call.
1861–1864	Per other discussion: `!(A ==B && C == D)` -> `A != B \|\| C !=D` Can we also add a comment about what these checks mean?
1873–1874	Probably better to save `reduceOp` and avoid a pool specific name to keep it generic
1875–1911	Can we move this switch case above to a utility function? It looks like this is matching the convolution or pool kind so naming that accordingly would help readability.
1882	I found this check confusing: how could `oper == Pool` and `isPoolingOp` return false? Should we rename `isPoolingOp` to something like `isSupportedPoolCombiner`?
1906	https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements
2042–2048	Not necessarily in `Debug` mode, which is a compiler mode we should also optimize for efficient debugging :)
2043–2048	Per LLVM guidelines, please finish comments with period. https://llvm.org/docs/CodingStandards.html#id8
mlir/test/Dialect/Linalg/vectorize-convolution.mlir
575	Missing `-----`?
585	We usually use `CHECK-LABEL` to match the function name. It can make the test more resilient to accidental checks.
599	Please, use `CHECK-DAG` only when absolutely necessary. It makes the mapping more expensive and can make debugging an actual nightmare. For example, It's common to use DAG for constants, maps, globals but I don't think we need it here for anything else.

In D140188#4027033, @dcaballe wrote:

Thanks for addressing the feedback! I did another pass.

There's contradictory feedback here about separating out pooling (the earlier review, for instance). There are only a few places where isPool is checked (8 places after the constructor, 4 of which can be removed without affecting functionality, so effectively just 4 places) and the amount of shared code is big. When we had brainstormed the design, we explicitly decided to not create a new transformation for pooling. Pooling ops implement the convolution interface, precisely to combine it with convolution.

I think @hanchung's feedback is similar to mine. We both feel that something is off but we haven't written the original code so we can't make a specific suggestion without spending much more time on the code. The fact that we have code that conditionally depends on the Conv or Pool enum values indicates that we are conflating disjoint implementations into a single one and that the design would benefit from using hierarchy or overloading to separate the Conv or Pool specific code from the shared main algorithm. I'm ok if this is not addressed as part of this patch but it's very likely that more specific code is added in the near future as we add support for more Conv and Pool cases so it would be better to do some refactoring now that the implementation is simpler. If you think that having a separate transformation for pooling, with common utilities shared with conv, should be reconsider... that should be fine! I personally can't suggest anything more specific without diving more into the code and the structured generator infra.

Hopefully that helps!

I think @rsuderman was suggesting to keep more code in common. Anyway, I still think these two ops are so close to each other than we have to share a lot of code between them. We can go over cleaning it up further in a future patch.

vmurali marked 7 inline comments as done.Jan 4 2023, 2:35 PM

vmurali added inline comments.

mlir/test/Dialect/Linalg/vectorize-convolution.mlir
585	I was basically following the style of the convolution functions. I can change these.

Addressing Diego's comments

Harbormaster completed remote builds in B205779: Diff 486399.Jan 4 2023, 3:04 PM

hanchung added inline comments.Jan 5 2023, 11:21 AM

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1863–1865	The comment and the check mismatch. One is RHS and the other is RES. Should this be `\|\|`, not `&&`?
1881	Declaring a variable for `rhsShapedType.getRank()` is easier to parse. And we can write `if (!A &&!B)` instead of `if (!(A \|\| B))`.
1916–1917	Where is the `f` dimension? Shouldn't this be `bindShapeDims(rhsShapedType, kwSize, cSize, fSize);`?
1994–1997	There could have more enum values. The comment might be inaccurate. We should either update the comment or the if-check.. The comment should be combined with the previous comment as well...
2063	nit: s/Perform/perform
2130–2132	please remove braces for single statement...
2388	maybe rename it to `OperType` or `OperKind`?
2407–2409	nit: we usually name it with `numBlockArguments`.
mlir/test/Dialect/Linalg/vectorize-convolution.mlir
586	why not name them to `INPUT`, `FILTER`, `OUTPUT`? It can match the naming in the input MLIR better and help reviewing the test. I don't know if we have naming style guide for lit test, but we should keep them consistent. All the variables in this file are upper cases with `_`.

Thanks a lot for addressing all the comments! It looks great to me! Please, wait for others to finish the review before committing. Thanks!

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1874–1876	you can use `auto` for this type.
2406	We use `LogicalResult` instead of `bool` for this.

dcaballe accepted this revision.Jan 5 2023, 11:46 AM

vmurali added inline comments.Jan 5 2023, 2:42 PM

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1863–1865	Yes, the code that I changed from my previous version is wrong :). This is precisely why I think we should just implement what is written in the comments instead of applying demorgan's law and then implementing it, even if it doesn't conform to what exists currently in the LLVM/MLIR code base - it eliminates trivial bugs like this, and is probably easier to optimize for the c++ compiler than for humans.

vmurali marked an inline comment as done.Jan 5 2023, 2:44 PM

vmurali added inline comments.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1916–1917	f is already bound using outputs, so I don't rebind it now.

Addressed Hanhan's and Diego's comments

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1881	I have changed (!( A \|\| B \|\| C)) into its distributed equivalent (applying demorgan's law), but as I said in my earlier comment, I think doing this conversion is error prone (at least for me, as demonstrated in my previous revision, when I did this transform in a different location), especially when the conditions become more complicated. It's easier to list all the acceptable alternatives using an OR (where each alternative itself is a complicated expression). Since the semantics of this function is to call return whenever we hit a failure, we should just negate the list.
1994–1997	I am not sure if this code can handle any operation other than Conv and Pooling. I can fix the comment saying this is done only for Conv currently.
2406	I think it just clutters the code (all the returns based on expressions should be converted into LogicalResult using the constructor, example in line 2409). It's probably needed if it's a transformation pass, but this is a helper utility which has nothing to do with LLVM.
mlir/test/Dialect/Linalg/vectorize-convolution.mlir
586	These tests were auto constructed from the output after examining the outputs to be correct, and my script doesn't transform the names other than adding a "V". I can fix it manually, but it's probably not very important, given that these tests will fail appropriately. There are too many variations of convolution to write these tests by hand.

This revision was not accepted when it landed; it landed in state Needs Review.Jan 5 2023, 3:16 PM

Closed by commit rG755e776849be: [mlir][linalg] Vectorize 1D convolution (authored by vmurali). · Explain Why

This revision was automatically updated to reflect the committed changes.

vmurali marked 3 inline comments as done.

vmurali added a commit: rG755e776849be: [mlir][linalg] Vectorize 1D convolution.

dcaballe added inline comments.Jan 6 2023, 10:40 AM

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
2406	We only have to s/true/success() and s/false/failure(), right? `LogicalResult` is used in general all over the place to signal if an something succeeded or failed. It's kind of a convention. It provides more context than just a bool, that could be interpreted in different ways... but most importantly, it enforces the caller to check if the result is success or failure and handle both outcomes properly (if the return value if the function is ignored you will get a warning). In general, there are a few benefits and I think this is a common case where it's used.

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Linalg/

Transforms/

Vectorization.cpp

257 lines

test/

Dialect/

Linalg/

vectorize-convolution.mlir

279 lines

Diff 486691

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

Show First 20 Lines • Show All 379 Lines • ▼ Show 20 Lines	mlir::linalg::getCombinerOpKind(Operation *combinerOp) {

if (!combinerOp)		if (!combinerOp)
return std::nullopt;		return std::nullopt;
return llvm::TypeSwitch<Operation *, std::optional<CombiningKind>>(combinerOp)		return llvm::TypeSwitch<Operation *, std::optional<CombiningKind>>(combinerOp)
.Case<arith::AddIOp, arith::AddFOp>(		.Case<arith::AddIOp, arith::AddFOp>(
[&](auto op) { return CombiningKind::ADD; })		[&](auto op) { return CombiningKind::ADD; })
.Case<arith::AndIOp>([&](auto op) { return CombiningKind::AND; })		.Case<arith::AndIOp>([&](auto op) { return CombiningKind::AND; })
.Case<arith::MaxSIOp>([&](auto op) { return CombiningKind::MAXSI; })		.Case<arith::MaxSIOp>([&](auto op) { return CombiningKind::MAXSI; })
		.Case<arith::MaxUIOp>([&](auto op) { return CombiningKind::MAXUI; })
.Case<arith::MaxFOp>([&](auto op) { return CombiningKind::MAXF; })		.Case<arith::MaxFOp>([&](auto op) { return CombiningKind::MAXF; })
.Case<arith::MinSIOp>([&](auto op) { return CombiningKind::MINSI; })		.Case<arith::MinSIOp>([&](auto op) { return CombiningKind::MINSI; })
		.Case<arith::MinUIOp>([&](auto op) { return CombiningKind::MINUI; })
.Case<arith::MinFOp>([&](auto op) { return CombiningKind::MINF; })		.Case<arith::MinFOp>([&](auto op) { return CombiningKind::MINF; })
.Case<arith::MulIOp, arith::MulFOp>(		.Case<arith::MulIOp, arith::MulFOp>(
[&](auto op) { return CombiningKind::MUL; })		[&](auto op) { return CombiningKind::MUL; })
.Case<arith::OrIOp>([&](auto op) { return CombiningKind::OR; })		.Case<arith::OrIOp>([&](auto op) { return CombiningKind::OR; })
.Case<arith::XOrIOp>([&](auto op) { return CombiningKind::XOR; })		.Case<arith::XOrIOp>([&](auto op) { return CombiningKind::XOR; })
.Default([&](auto op) { return std::nullopt; });		.Default([&](auto op) { return std::nullopt; });
}		}

▲ Show 20 Lines • Show All 1,393 Lines • ▼ Show 20 Lines

/// Bind a pack of int& to the leading dimensions of shapedType.getShape().		/// Bind a pack of int& to the leading dimensions of shapedType.getShape().
template <typename... IntTy>		template <typename... IntTy>
static void bindShapeDims(ShapedType shapedType, IntTy &...vals) {		static void bindShapeDims(ShapedType shapedType, IntTy &...vals) {
bindShapeDims<0>(shapedType, vals...);		bindShapeDims<0>(shapedType, vals...);
}		}

namespace {		namespace {
		bool isCastOfBlockArgument(Operation *op) {
		return isa<CastOpInterface>(op) && op->getNumOperands() == 1 &&
		hanchungUnsubmitted Done Reply Inline Actions A CastOpInterface is allowed to have multiple inputs. However, we unconditionally check the first operand. That should be taken into account. hanchung: A CastOpInterface is allowed to have multiple inputs. However, we unconditionally check the…
		vmuraliAuthorUnsubmitted Done Reply Inline Actions Aah then we have to explicitly make sure there's only one operand. I will change that vmurali: Aah then we have to explicitly make sure there's only one operand. I will change that
		op->getOperand(0).isa<BlockArgument>();
		}

		bool isSupportedPoolKind(vector::CombiningKind kind) {
		switch (kind) {
		case vector::CombiningKind::ADD:
		case vector::CombiningKind::MAXF:
		case vector::CombiningKind::MAXSI:
		case vector::CombiningKind::MAXUI:
		case vector::CombiningKind::MINF:
		case vector::CombiningKind::MINSI:
		case vector::CombiningKind::MINUI:
		return true;
		default:
		return false;
		}
		}

/// Generate a vector implementation for either:		/// Generate a vector implementation for either:
/// ```		/// ```
		hanchungUnsubmitted Done Reply Inline Actions [optional] I think using `case switch` captures this better. That's also what we used in VectorOps.cpp. E.g., static bool isSupportedCombiningKind(CombiningKind combiningKind, Type elementType) { switch (combiningKind) { case CombiningKind::ADD: case CombiningKind::MUL: return elementType.isIntOrIndexOrFloat(); case CombiningKind::MINUI: case CombiningKind::MINSI: case CombiningKind::MAXUI: case CombiningKind::MAXSI: case CombiningKind::AND: case CombiningKind::OR: case CombiningKind::XOR: return elementType.isIntOrIndex(); case CombiningKind::MINF: case CombiningKind::MAXF: return elementType.isa<FloatType>(); } return false; } hanchung: [optional] I think using `case switch` captures this better. That's also what we used in…
/// Op def: ( n, w, c, kw, f )		/// Op def: ( n, w, c, kw, f )
/// Iters: ({Par(), Par(), Par(), Red(), Red()})		/// Iters: ({Par(), Par(), Par(), Red(), Red()})
/// Layout: {{n, strideW * w + dilationW * kw, c}, {kw, c, f}, {n, w, f}}		/// Layout: {{n, strideW * w + dilationW * kw, c}, {kw, c, f}, {n, w, f}}
		hanchungUnsubmitted Done Reply Inline Actions llvm style nit: Don’t Use Braces on Simple Single-Statement Bodies of if/else/loop Statements https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements hanchung: llvm style nit: Don’t Use Braces on Simple Single-Statement Bodies of if/else/loop Statements…
		dcaballeUnsubmitted Done Reply Inline Actions Nit: If they are not widely used, I would write these predicates in place to prevent an explosion of utilities with different combinations. If you need a local predicate, you can write a local lambda. For example, since `!isa<arith::MulIOp, arith::MulFOp>(op)` is only used once, I would write it as: if (!isa<arith::MulIOp, arith::MulFOp>(op) && llvm::any_of(op->getOperands(), ...)) dcaballe: Nit: If they are not widely used, I would write these predicates in place to prevent an…
		vmuraliAuthorUnsubmitted Done Reply Inline Actions Rob was suggesting that I create specific functions for these. I can replace the for-loop with `llvm::all_of`. Maybe I can inline the multiply part alone after switching to llvm::all_of vmurali: Rob was suggesting that I create specific functions for these. I can replace the for-loop with…
		dcaballeUnsubmitted Done Reply Inline Actions Sorry, I missed Rob's comment. In general, utility functions make sense. However, for these specific and simple checks that are used all over the compiler there are some trade-offs that I would consider. If the check is used only once, it's probably simpler to parse the check directly (developers are very used to these kind of `isa` checks) than perhaps a complex name (e.g., `isBlockArgumentOrCastOfBlockArgument`). If you still feel that giving this a name is more convenient for a single use, a local named lambda might work better. The other problem I see with adding static utilities for these checks is that there could easily be a combinatorial explosion of them and, most importantly, we may create a style or API for these checks local to this specific file, which wouldn't be aligned with the rest of the compiler and it would probably be used inconsistently by those not aware of it. Anyways, too much discussion for this subjective nit comment :). I hope with all this information in place you can make a good call. dcaballe: Sorry, I missed Rob's comment. In general, utility functions make sense. However, for these…
/// ```		/// ```
/// kw is unrolled, w is unrolled iff dilationW > 1.		/// kw is unrolled, w is unrolled iff dilationW > 1.
///		///
/// or		/// or
///		///
/// ```		/// ```
/// Op def: ( n, c, w, f, kw )		/// Op def: ( n, c, w, f, kw )
/// Iters: ({Par(), Par(), Par(), Red(), Red()})		/// Iters: ({Par(), Par(), Par(), Red(), Red()})
Show All 19 Lines	Conv1DGenerator(RewriterBase &rewriter, LinalgOp linalgOp, int strideW,
if (linalgOp.getNumDpsInputs() != 2 \|\| linalgOp.getNumDpsInits() != 1)		if (linalgOp.getNumDpsInputs() != 2 \|\| linalgOp.getNumDpsInits() != 1)
return;		return;
lhsShaped = linalgOp.getDpsInputOperand(0)->get();		lhsShaped = linalgOp.getDpsInputOperand(0)->get();
rhsShaped = linalgOp.getDpsInputOperand(1)->get();		rhsShaped = linalgOp.getDpsInputOperand(1)->get();
resShaped = linalgOp.getDpsInitOperand(0)->get();		resShaped = linalgOp.getDpsInitOperand(0)->get();
lhsShapedType = lhsShaped.getType().dyn_cast<ShapedType>();		lhsShapedType = lhsShaped.getType().dyn_cast<ShapedType>();
rhsShapedType = rhsShaped.getType().dyn_cast<ShapedType>();		rhsShapedType = rhsShaped.getType().dyn_cast<ShapedType>();
resShapedType = resShaped.getType().dyn_cast<ShapedType>();		resShapedType = resShaped.getType().dyn_cast<ShapedType>();
if (!lhsShapedType \|\| !rhsShapedType \|\| !resShapedType)		if (!lhsShapedType \|\| !rhsShapedType \|\| !resShapedType)
return;		return;
if (lhsShapedType.getRank() != 3 \|\|		// LHS has dimension NCW/NWC and RES has dimension NFW/NCW/NWF/NWC.
(rhsShapedType.getRank() != 2 && rhsShapedType.getRank() != 3) \|\|		if (lhsShapedType.getRank() != 3 \|\| resShapedType.getRank() != 3)
		hanchungUnsubmitted Done Reply Inline Actions should we just check if the op is a LinalgConvolutionOpInterface? Maybe @antiagainst should weigh in because he's the author of the implementation. hanchung: should we just check if the op is a LinalgConvolutionOpInterface? Maybe @antiagainst should…
		vmuraliAuthorUnsubmitted Done Reply Inline Actions I believe this check has already been performed vmurali: I believe this check has already been performed
		hanchungUnsubmitted Done Reply Inline Actions I don't know if the check has been performed or not. I look into the struct members and I can only say that it is a LinalgOp. A LinalgOp does not have to be LinalgConvolutionOpInterface. But now I figure out why we check this. It is because the generator works for Conv1D. And now we are updating the Conv1D definition to consider Pool1D cases. In this context, should we add back the check about `resShapedType.getRank() != 3`? (IMO, we should have LinalgConvolutionOpInterface::is1D method and move the check to there. But I think it should not be a blocker of the pooling vectorization patch.) hanchung: I don't know if the check has been performed or not. I look into the struct members and I can…
		vmuraliAuthorUnsubmitted Done Reply Inline Actions `resShapedType.getRank() != 3` is performed after I figure out it's convolution and not pooling (line 1918). I suppose I can refactor it in and make the check earlier, since both pooling and convolution have rank 3. vmurali: `resShapedType.getRank() != 3` is performed after I figure out it's convolution and not pooling…
		dcaballeUnsubmitted Done Reply Inline Actions Per other discussion: `!(A ==B && C == D)` -> `A != B \|\| C !=D` Can we also add a comment about what these checks mean? dcaballe: Per other discussion: `!(A ==B && C == D)` -> `A != B \|\| C !=D` Can we also add a comment…
resShapedType.getRank() != 3)
return;		return;
		hanchungUnsubmitted Done Reply Inline Actions The comment and the check mismatch. One is RHS and the other is RES. Should this be `\|\|`, not `&&`? hanchung: The comment and the check mismatch. One is RHS and the other is RES. Should this be `\|\|`, not…
		vmuraliAuthorUnsubmitted Done Reply Inline Actions Yes, the code that I changed from my previous version is wrong :). This is precisely why I think we should just implement what is written in the comments instead of applying demorgan's law and then implementing it, even if it doesn't conform to what exists currently in the LLVM/MLIR code base - it eliminates trivial bugs like this, and is probably easier to optimize for the c++ compiler than for humans. vmurali: Yes, the code that I changed from my previous version is wrong :). This is precisely why I…

// Check for reduction `add` preceded by `mul`.
Operation *reduceOp = matchLinalgReduction(linalgOp.getDpsInitOperand(0));		Operation *reduceOp = matchLinalgReduction(linalgOp.getDpsInitOperand(0));
if (!reduceOp)		if (!reduceOp)
return;		return;
std::optional<vector::CombiningKind> maybeKind;		redOp = reduceOp->getName().getIdentifier();
maybeKind = getCombinerOpKind(reduceOp);
if (!maybeKind \|\| *maybeKind != vector::CombiningKind::ADD)		if (!setOperKind(reduceOp))
return;
// Check for single `mul` predecessor. The `mul` operands must be block
// arguments or extension of block arguments.
Operation *mulOp = nullptr;
for (Value operand : reduceOp->getOperands()) {
if (operand.isa<BlockArgument>())
continue;
if (mulOp)
return;		return;
mulOp = operand.getDefiningOp();		auto maybeKind = getCombinerOpKind(reduceOp);
		dcaballeUnsubmitted Done Reply Inline Actions Probably better to save `reduceOp` and avoid a pool specific name to keep it generic dcaballe: Probably better to save `reduceOp` and avoid a pool specific name to keep it generic
if (!mulOp \|\| !isa<arith::MulIOp, arith::MulFOp>(mulOp))		if (!(maybeKind && (*maybeKind == vector::CombiningKind::ADD \|\|
		(oper == Pool && isSupportedPoolKind(*maybeKind))))) {
		dcaballeUnsubmitted Done Reply Inline Actions you can use `auto` for this type. dcaballe: you can use `auto` for this type.
return;		return;
		hanchungUnsubmitted Done Reply Inline Actions having the check is better than comment it.. And we can use `switch-case` for this kind of code. hanchung: having the check is better than comment it.. And we can use `switch-case` for this kind of code.
}		}
if (!mulOp)
return;		auto rhsRank = rhsShapedType.getRank();
for (Value operand : mulOp->getOperands()) {		switch (oper) {
		hanchungUnsubmitted Done Reply Inline Actions Declaring a variable for `rhsShapedType.getRank()` is easier to parse. And we can write `if (!A &&!B)` instead of `if (!(A \|\| B))`. hanchung: Declaring a variable for `rhsShapedType.getRank()` is easier to parse. And we can write `if (!A…
		vmuraliAuthorUnsubmitted Done Reply Inline Actions I have changed (!( A \|\| B \|\| C)) into its distributed equivalent (applying demorgan's law), but as I said in my earlier comment, I think doing this conversion is error prone (at least for me, as demonstrated in my previous revision, when I did this transform in a different location), especially when the conditions become more complicated. It's easier to list all the acceptable alternatives using an OR (where each alternative itself is a complicated expression). Since the semantics of this function is to call return whenever we hit a failure, we should just negate the list. vmurali: I have changed (!( A \|\| B \|\| C)) into its distributed equivalent (applying demorgan's law), but…
if (Operation *def = operand.getDefiningOp()) {		case Conv:
		dcaballeUnsubmitted Done Reply Inline Actions I found this check confusing: how could `oper == Pool` and `isPoolingOp` return false? Should we rename `isPoolingOp` to something like `isSupportedPoolCombiner`? dcaballe: I found this check confusing: how could `oper == Pool` and `isPoolingOp` return false? Should…
if (!isa<CastOpInterface>(def))		if (rhsRank != 2 && rhsRank!= 3)
return;		return;
operand = def->getOperand(0);		break;
}		case Pool:
if (!operand.isa<BlockArgument>())		if (rhsRank != 1)
		dcaballeUnsubmitted Done Reply Inline Actions Is it possible to simplify this code? If I parse this correctly, the loop body beyond the `isa<BlockArgument` check will be executed only once as `feedOp = getDefiningOp()` will never be null. I would perhaps add a few simpler checks using `llvm::count_if` to check that there is only one operand that is not a block argument, then `llvm::find_if` to get it, and move the remaining code out of the loop. Just a quick suggestion, there must be other ways. Perhaps is also a good idea to separate the two `If` described in the comments if that simplifies the code. I would prioritize simplicity over performance as the number of operands is very small. dcaballe: Is it possible to simplify this code? If I parse this correctly, the loop body beyond the…
		vmuraliAuthorUnsubmitted Done Reply Inline Actions I just modified what existed earlier - look at `mulOp` in the earlier code. But yes, I am just checking if there's at most one non-BlockArgument, and if I find it, I make sure it's either a MulOp of block arguments or casts of block arguments (convolution), or a cast of a block argument (pooling). vmurali: I just modified what existed earlier - look at `mulOp` in the earlier code. But yes, I am just…
return;		return;
		break;
}		}
// The op is now known to be valid.		// The op is now known to be valid.
valid = true;		valid = true;
}		}

/// Generate a vector implementation for:		/// Generate a vector implementation for:
/// ```		/// ```
/// Op def: ( n, w, c, kw, f )		/// Op def: ( n, w, c, kw, f )
		rsudermanUnsubmitted Done Reply Inline Actions This block is quite difficult to parse. I would recommend factoring it out into a helper or reworking to avoid multiple nested if/else statements. This could be as simple as getting your three block operands. Checking for casts. Seeing if there is a mul between input/kernel, then finding the combiner. with the accumulator. Then special checks (e.g. finding the combiner) can be done by helpers to avoid nesting the list inside of the entire block. It would provide a more step-by-step process and avoid parsing deep parts of the implementation. rsuderman: This block is quite difficult to parse. I would recommend factoring it out into a helper or…
		dcaballeUnsubmitted Done Reply Inline Actions what about something like if (!maybeKind \|\| !isSupportedConvKind(isPool)) return; dcaballe: what about something like ``` if (!maybeKind \|\| !isSupportedConvKind(isPool)) return; ```
		vmuraliAuthorUnsubmitted Done Reply Inline Actions I think there are other CombiningKinds that are not used in Convolution or Pooling. We should have an explicit list for convolution and pooling and not rely on not-convolution being pooling. vmurali: I think there are other CombiningKinds that are not used in Convolution or Pooling. We should…
/// Iters: ({Par(), Par(), Par(), Red(), Red()})		/// Iters: ({Par(), Par(), Par(), Red(), Red()})
/// Layout: {{n, strideW * w + dilationW * kw, c}, {kw, c, f}, {n, w, f}}		/// Layout: {{n, strideW * w + dilationW * kw, c}, {kw, c, f}, {n, w, f}}
		dcaballeUnsubmitted Done Reply Inline Actions nit: `!(A == B && C == D)` -> `A != B \|\| C !=D` ? It seems easier to parse. dcaballe: nit: `!(A == B && C == D)` -> `A != B \|\| C !=D` ? It seems easier to parse.
		vmuraliAuthorUnsubmitted Done Reply Inline Actions For me, `!(A && B)` is a lot more readable because I enumerate the valid conditions in my head and simply negate it for invalid. vmurali: For me, `!(A && B)` is a lot more readable because I enumerate the valid conditions in my head…
		hanchungUnsubmitted Done Reply Inline Actions I think we should follow the pattern that most of people use in MLIR/LLVM. My experience tells me that I've seen `A != B \|\| C != D` form a lot in the code browsing and code review. I seldom see `!(A && B)` form in the codebase. hanchung: I think we should follow the pattern that most of people use in MLIR/LLVM. My experience tells…
		vmuraliAuthorUnsubmitted Done Reply Inline Actions Okay maybe I am weird in thinking about conditions this way 🤣. But, once we have more complicated conditions like what we have here, having an OR of negations is not enough. One has to negate complicated expressions leading to some complex expression containing ANDs of some negations and some literals as is, all ORed together. At that point it's just impossible to reason about. Instead, just listing all the valid options as OR, with the valid options containing a bunch of ANDed expressions for that option vmurali: Okay maybe I am weird in thinking about conditions this way 🤣. But, once we have more…
		hanchungUnsubmitted Done Reply Inline Actions I can point out the if-cond that I think can be the way; mark them an optional comment.. hanchung: I can point out the if-cond that I think can be the way; mark them an optional comment..
/// ```		/// ```
/// kw is always unrolled.		/// kw is always unrolled.
/// TODO: w (resp. kw) is unrolled when the strideW ( resp. dilationW) is		/// TODO: w (resp. kw) is unrolled when the strideW ( resp. dilationW) is
/// > 1.		/// > 1.
FailureOr<Operation *> conv(Conv1DOpOrder conv1DOpOrder) {		FailureOr<Operation *> conv(Conv1DOpOrder conv1DOpOrder) {
if (!valid)		if (!valid)
return rewriter.notifyMatchFailure(op, "unvectorizable 1-D conv");		return rewriter.notifyMatchFailure(op, "unvectorizable 1-D conv/pool");
		dcaballeUnsubmitted Done Reply Inline Actions https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements dcaballe: https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies…

		hanchungUnsubmitted Done Reply Inline Actions we can just update the error message to `conv/pool`.. The notification already includes the op. People can look into the op name and figure out what's happening. hanchung: we can just update the error message to `conv/pool`.. The notification already includes the op.
int64_t nSize, wSize, cSize, kwSize, fSize;		int64_t nSize, wSize, cSize, kwSize, fSize;
SmallVector<int64_t, 3> lhsShape, rhsShape, resShape;		SmallVector<int64_t, 3> lhsShape, rhsShape, resShape;
switch (conv1DOpOrder) {		switch (conv1DOpOrder) {
case Conv1DOpOrder::Nwc:		case Conv1DOpOrder::Nwc:
		dcaballeUnsubmitted Done Reply Inline Actions Can we move this switch case above to a utility function? It looks like this is matching the convolution or pool kind so naming that accordingly would help readability. dcaballe: Can we move this switch case above to a utility function? It looks like this is matching the…
// kernel{kw, c, f}
bindShapeDims(rhsShapedType, kwSize, cSize, fSize);
// out{n, w, f}		// out{n, w, f}
bindShapeDims(resShapedType, nSize, wSize);		bindShapeDims(resShapedType, nSize, wSize, fSize);
		switch (oper) {
		case Conv:
		// kernel{kw, c, f}
		bindShapeDims(rhsShapedType, kwSize, cSize);
		hanchungUnsubmitted Done Reply Inline Actions Where is the `f` dimension? Shouldn't this be `bindShapeDims(rhsShapedType, kwSize, cSize, fSize);`? hanchung: Where is the `f` dimension? Shouldn't this be `bindShapeDims(rhsShapedType, kwSize, cSize…
		vmuraliAuthorUnsubmitted Done Reply Inline Actions f is already bound using outputs, so I don't rebind it now. vmurali: f is already bound using outputs, so I don't rebind it now.
		break;
		case Pool:
		// kernel{kw}
		bindShapeDims(rhsShapedType, kwSize);
		cSize = fSize;
		break;
		}
lhsShape = {nSize,		lhsShape = {nSize,
// iw = ow * sw + kw * dw - 1		// iw = ow * sw + kw * dw - 1
// (i.e. 16 convolved with 3 (@stride 1 dilation 1) -> 14)		// (i.e. 16 convolved with 3 (@stride 1 dilation 1) -> 14)
// Perform the proper inclusive -> exclusive -> inclusive.		// Perform the proper inclusive -> exclusive -> inclusive.
((wSize - 1) * strideW + 1) + ((kwSize - 1) * dilationW + 1) -		((wSize - 1) * strideW + 1) + ((kwSize - 1) * dilationW + 1) -
1,		1,
cSize};		cSize};
		switch (oper) {
		case Conv:
rhsShape = {kwSize, cSize, fSize};		rhsShape = {kwSize, cSize, fSize};
		break;
		case Pool:
		rhsShape = {kwSize};
		break;
		}
resShape = {nSize, wSize, fSize};		resShape = {nSize, wSize, fSize};
break;		break;
		rsudermanUnsubmitted Done Reply Inline Actions There appears to be a decent amount of shared implementation with the convolution implementation. I would recommend maintaining shared lines for shared implementation. rsuderman: There appears to be a decent amount of shared implementation with the convolution…
case Conv1DOpOrder::Ncw:		case Conv1DOpOrder::Ncw:
// kernel{f, c, kw}
bindShapeDims(rhsShapedType, fSize, cSize, kwSize);
// out{n, f, w}		// out{n, f, w}
bindShapeDims(resShapedType, nSize, fSize, wSize);		bindShapeDims(resShapedType, nSize, fSize, wSize);
		switch (oper) {
		case Conv:
		// kernel{f, c, kw}
		bindShapeDims(rhsShapedType, fSize, cSize, kwSize);
		break;
		case Pool:
		// kernel{kw}
		bindShapeDims(rhsShapedType, kwSize);
		cSize = fSize;
		break;
		}
lhsShape = {nSize, cSize,		lhsShape = {nSize, cSize,
// iw = ow * sw + kw * dw - 1		// iw = ow * sw + kw * dw - 1
// (i.e. 16 convolved with 3 (@stride 1 dilation 1) -> 14)		// (i.e. 16 convolved with 3 (@stride 1 dilation 1) -> 14)
// Perform the proper inclusive -> exclusive -> inclusive.		// Perform the proper inclusive -> exclusive -> inclusive.
((wSize - 1) * strideW + 1) + ((kwSize - 1) * dilationW + 1) -		((wSize - 1) * strideW + 1) + ((kwSize - 1) * dilationW + 1) -
1};		1};
		switch (oper) {
		case Conv:
rhsShape = {fSize, cSize, kwSize};		rhsShape = {fSize, cSize, kwSize};
		break;
		case Pool:
		rhsShape = {kwSize};
		break;
		}
resShape = {nSize, fSize, wSize};		resShape = {nSize, fSize, wSize};
break;		break;
		rsudermanUnsubmitted Done Reply Inline Actions Ditto about the shared implementation. rsuderman: Ditto about the shared implementation.
}		}

vector::TransferWriteOp write;		vector::TransferWriteOp write;
Value zero = rewriter.create<arith::ConstantIndexOp>(loc, 0);		Value zero = rewriter.create<arith::ConstantIndexOp>(loc, 0);

// w is unrolled (i.e. wSizeStep == 1) iff strideW > 1.		// w is unrolled (i.e. wSizeStep == 1) iff strideW > 1.
// When strideW == 1, we can batch the contiguous loads and avoid		// When strideW == 1, we can batch the contiguous loads and avoid
// unrolling		// unrolling
int64_t wSizeStep = strideW == 1 ? wSize : 1;		int64_t wSizeStep = strideW == 1 ? wSize : 1;

Type lhsEltType = lhsShapedType.getElementType();		Type lhsEltType = lhsShapedType.getElementType();
Type rhsEltType = rhsShapedType.getElementType();		Type rhsEltType = rhsShapedType.getElementType();
Type resEltType = resShapedType.getElementType();		Type resEltType = resShapedType.getElementType();
auto lhsType = VectorType::get(lhsShape, lhsEltType);		auto lhsType = VectorType::get(lhsShape, lhsEltType);
auto rhsType = VectorType::get(rhsShape, rhsEltType);		auto rhsType = VectorType::get(rhsShape, rhsEltType);
auto resType = VectorType::get(resShape, resEltType);		auto resType = VectorType::get(resShape, resEltType);
// Read lhs slice of size {w * strideW + kw * dilationW, c, f} @ [0, 0,		// Read lhs slice of size {w * strideW + kw * dilationW, c, f} @ [0, 0,
// 0].		// 0].
Value lhs = rewriter.create<vector::TransferReadOp>(		Value lhs = rewriter.create<vector::TransferReadOp>(
loc, lhsType, lhsShaped, ValueRange{zero, zero, zero});		loc, lhsType, lhsShaped, ValueRange{zero, zero, zero});
// Read rhs slice of size {kw, c, f} @ [0, 0, 0].		// Read rhs slice of size {kw, c, f} @ [0, 0, 0].
Value rhs = rewriter.create<vector::TransferReadOp>(		// This is needed only for Conv.
		Value rhs = nullptr;
		if (oper == Conv)
		rhs = rewriter.create<vector::TransferReadOp>(
loc, rhsType, rhsShaped, ValueRange{zero, zero, zero});		loc, rhsType, rhsShaped, ValueRange{zero, zero, zero});
		hanchungUnsubmitted Done Reply Inline Actions There could have more enum values. The comment might be inaccurate. We should either update the comment or the if-check.. The comment should be combined with the previous comment as well... hanchung: There could have more enum values. The comment might be inaccurate. We should either update the…
		vmuraliAuthorUnsubmitted Done Reply Inline Actions I am not sure if this code can handle any operation other than Conv and Pooling. I can fix the comment saying this is done only for Conv currently. vmurali: I am not sure if this code can handle any operation other than Conv and Pooling. I can fix the…
// Read res slice of size {n, w, f} @ [0, 0, 0].		// Read res slice of size {n, w, f} @ [0, 0, 0].
Value res = rewriter.create<vector::TransferReadOp>(		Value res = rewriter.create<vector::TransferReadOp>(
loc, resType, resShaped, ValueRange{zero, zero, zero});		loc, resType, resShaped, ValueRange{zero, zero, zero});

// The base vectorization case is input: {n,w,c}, weight: {kw,c,f}, output:		// The base vectorization case is input: {n,w,c}, weight: {kw,c,f}, output:
// {n,w,f}. To reuse the base pattern vectorization case, we do pre		// {n,w,f}. To reuse the base pattern vectorization case, we do pre
// transpose on input, weight, and output.		// transpose on input, weight, and output.
switch (conv1DOpOrder) {		switch (conv1DOpOrder) {
case Conv1DOpOrder::Nwc:		case Conv1DOpOrder::Nwc:
// Base case, so no transposes necessary.		// Base case, so no transposes necessary.
break;		break;
case Conv1DOpOrder::Ncw: {		case Conv1DOpOrder::Ncw: {
// To match base vectorization case, we pre-transpose current case.		// To match base vectorization case, we pre-transpose current case.
// ncw -> nwc		// ncw -> nwc
static constexpr std::array<int64_t, 3> permLhs = {0, 2, 1};		static constexpr std::array<int64_t, 3> permLhs = {0, 2, 1};
lhs = rewriter.create<vector::TransposeOp>(loc, lhs, permLhs);		lhs = rewriter.create<vector::TransposeOp>(loc, lhs, permLhs);
// fcw -> wcf		// fcw -> wcf
static constexpr std::array<int64_t, 3> permRhs = {2, 1, 0};		static constexpr std::array<int64_t, 3> permRhs = {2, 1, 0};

		// This is needed only for Conv.
		hanchungUnsubmitted Done Reply Inline Actions Here is an example that having enum is better. We can have better documentation and write code like // The reduce op of a convolution op is a binary op. if (enum == ConvEnum) rhs = ... It's more meaningful to me and consider the cases if there are more kinds of LinalgConvolutionOpInterface ops. hanchung: Here is an example that having enum is better. We can have better documentation and write code…
		vmuraliAuthorUnsubmitted Done Reply Inline Actions Aah I didn't realize you meant Conv vs Pool enum. I assumed you meant an enum combining `isPool` and `isPoolExt`, which seems unnecessary. vmurali: Aah I didn't realize you meant Conv vs Pool enum. I assumed you meant an enum combining…
		hanchungUnsubmitted Done Reply Inline Actions I meant an enum which may have `conv, pool, poolext` or just `conv, pool`. hanchung: I meant an enum which may have `conv, pool, poolext` or just `conv, pool`.
		vmuraliAuthorUnsubmitted Done Reply Inline Actions `poolExt` is a subset of `pool`, just to see if there's a cast during the pooling operation. And it's used in exactly one place, so it shouldn't be conflated with `conv` and `pool` vmurali: `poolExt` is a subset of `pool`, just to see if there's a cast during the pooling operation.
		if (oper == Conv)
rhs = rewriter.create<vector::TransposeOp>(loc, rhs, permRhs);		rhs = rewriter.create<vector::TransposeOp>(loc, rhs, permRhs);
// nfw -> nwf		// nfw -> nwf
static constexpr std::array<int64_t, 3> permRes = {0, 2, 1};		static constexpr std::array<int64_t, 3> permRes = {0, 2, 1};
res = rewriter.create<vector::TransposeOp>(loc, res, permRes);		res = rewriter.create<vector::TransposeOp>(loc, res, permRes);
break;		break;
}		}
}		}

//===------------------------------------------------------------------===//		//===------------------------------------------------------------------===//
// Begin vector-only rewrite part		// Begin vector-only rewrite part
//===------------------------------------------------------------------===//		//===------------------------------------------------------------------===//
// Unroll along kw and read slices of lhs and rhs.		// Unroll along kw and read slices of lhs and rhs.
SmallVector<Value> lhsVals, rhsVals, resVals;		SmallVector<Value> lhsVals, rhsVals, resVals;
// Extract lhs slice of size {n, wSizeStep, c} @ [0, sw * w + dw * kw, 0].		// Extract lhs slice of size {n, wSizeStep, c} @ [0, sw * w + dw * kw, 0].
for (int64_t kw = 0; kw < kwSize; ++kw) {		for (int64_t kw = 0; kw < kwSize; ++kw) {
for (int64_t w = 0; w < wSize; w += wSizeStep) {		for (int64_t w = 0; w < wSize; w += wSizeStep) {
lhsVals.push_back(rewriter.create<vector::ExtractStridedSliceOp>(		lhsVals.push_back(rewriter.create<vector::ExtractStridedSliceOp>(
loc, lhs,		loc, lhs,
/offsets=/ArrayRef<int64_t>{0, w * strideW + kw * dilationW, 0},		/offsets=/ArrayRef<int64_t>{0, w * strideW + kw * dilationW, 0},
/sizes=/ArrayRef<int64_t>{nSize, wSizeStep, cSize},		/sizes=/ArrayRef<int64_t>{nSize, wSizeStep, cSize},
/strides=/ArrayRef<int64_t>{1, 1, 1}));		/strides=/ArrayRef<int64_t>{1, 1, 1}));
}		}
}		}
// Extract rhs slice of size {c, f} @ [kw].		// Extract rhs slice of size {c, f} @ [kw].
		// Do not do for pooling.
		if (oper == Conv)
for (int64_t kw = 0; kw < kwSize; ++kw) {		for (int64_t kw = 0; kw < kwSize; ++kw) {
rhsVals.push_back(rewriter.create<vector::ExtractOp>(		rhsVals.push_back(rewriter.create<vector::ExtractOp>(
loc, rhs, /offsets=/ArrayRef<int64_t>{kw}));		loc, rhs, /offsets=/ArrayRef<int64_t>{kw}));
}		}
		hanchungUnsubmitted Done Reply Inline Actions I believe the compiler is smart for handling it, but can we just swap `for if` to `if for`? :) hanchung: I believe the compiler is smart for handling it, but can we just swap `for if` to `if for`? :)
		dcaballeUnsubmitted Done Reply Inline Actions Not necessarily in `Debug` mode, which is a compiler mode we should also optimize for efficient debugging :) dcaballe: Not necessarily in `Debug` mode, which is a compiler mode we should also optimize for efficient…
		dcaballeUnsubmitted Done Reply Inline Actions Per LLVM guidelines, please finish comments with period. https://llvm.org/docs/CodingStandards.html#id8 dcaballe: Per LLVM guidelines, please finish comments with period. https://llvm.org/docs/CodingStandards.
// Extract res slice: {n, wSizeStep, f} @ [0, w, 0].		// Extract res slice: {n, wSizeStep, f} @ [0, w, 0].
for (int64_t w = 0; w < wSize; w += wSizeStep) {		for (int64_t w = 0; w < wSize; w += wSizeStep) {
resVals.push_back(rewriter.create<vector::ExtractStridedSliceOp>(		resVals.push_back(rewriter.create<vector::ExtractStridedSliceOp>(
loc, res,		loc, res,
/offsets=/ArrayRef<int64_t>{0, w, 0},		/offsets=/ArrayRef<int64_t>{0, w, 0},
/sizes=/ArrayRef<int64_t>{nSize, wSizeStep, fSize},		/sizes=/ArrayRef<int64_t>{nSize, wSizeStep, fSize},
/strides=/ArrayRef<int64_t>{1, 1, 1}));		/strides=/ArrayRef<int64_t>{1, 1, 1}));
}		}

auto linearIndex = [&](int64_t kw, int64_t w) {		auto linearIndex = [&](int64_t kw, int64_t w) {
return kw * (wSize / wSizeStep) + w;		return kw * (wSize / wSizeStep) + w;
};		};

// Compute contraction: O{n, w, f} += I{n, sw * w + dw * kw, c} * F{c, f}		// Compute contraction: O{n, w, f} += I{n, sw * w + dw * kw, c} * F{c, f} or
		// perform simple arith operation for pooling
		hanchungUnsubmitted Done Reply Inline Actions nit: s/Perform/perform hanchung: nit: s/Perform/perform
for (int64_t kw = 0; kw < kwSize; ++kw) {		for (int64_t kw = 0; kw < kwSize; ++kw) {
for (int64_t w = 0; w < wSize; w += wSizeStep) {		for (int64_t w = 0; w < wSize; w += wSizeStep) {
resVals[w] = conv1dSliceAsContraction(		switch (oper) {
rewriter, loc, lhsVals[linearIndex(kw, w)], rhsVals[kw], resVals[w]);		case Conv:
		resVals[w] = conv1dSliceAsContraction(rewriter, loc,
		lhsVals[linearIndex(kw, w)],
		rhsVals[kw], resVals[w]);
		break;
		case Pool:
		resVals[w] = pool1dSlice(rewriter, loc, lhsVals[linearIndex(kw, w)],
		resVals[w]);
		break;
		}
}		}
		rsudermanUnsubmitted Done Reply Inline Actions Often if you have an if-else block in a simple loop (like here) it is simpler to do for (...) { if (condition) { do work; continue; } do other work; } This avoids the extra else statement and demonstrates there is only one task. rsuderman: Often if you have an if-else block in a simple loop (like here) it is simpler to do ``` for (..
}		}

// Write back res slice: {n, wSizeStep, f} @ [0, w, 0].		// Write back res slice: {n, wSizeStep, f} @ [0, w, 0].
// This does not depend on kw.		// This does not depend on kw.
for (int64_t w = 0; w < wSize; w += wSizeStep) {		for (int64_t w = 0; w < wSize; w += wSizeStep) {
res = rewriter.create<vector::InsertStridedSliceOp>(		res = rewriter.create<vector::InsertStridedSliceOp>(
loc, resVals[w], res,		loc, resVals[w], res,
/offsets=/ArrayRef<int64_t>{0, w, 0},		/offsets=/ArrayRef<int64_t>{0, w, 0},
Show All 33 Lines	Value conv1dSliceAsContraction(RewriterBase &rewriter, Location loc,
AffineExpr n, w, f, c;		AffineExpr n, w, f, c;
bindDims(ctx, n, w, f, c);		bindDims(ctx, n, w, f, c);
return rewriter.create<vector::ContractionOp>(		return rewriter.create<vector::ContractionOp>(
loc, lhs, rhs, res,		loc, lhs, rhs, res,
/indexingMaps=/MapList{{n, w, c}, {c, f}, {n, w, f}},		/indexingMaps=/MapList{{n, w, c}, {c, f}, {n, w, f}},
/iteratorTypes=/ArrayRef<vector::IteratorType>{par, par, par, red});		/iteratorTypes=/ArrayRef<vector::IteratorType>{par, par, par, red});
}		}

		// Create a reduction: lhs{n, w, c} -> res{n, w, c}
		Value pool1dSlice(RewriterBase &rewriter, Location loc, Value lhs,
		Value res) {
		if (isPoolExt)
		lhs = rewriter.create(loc, poolExtOp, lhs, res.getType())->getResult(0);
		return rewriter
		hanchungUnsubmitted Done Reply Inline Actions please remove braces for single statement... hanchung: please remove braces for single statement...
		.create(loc, redOp, ArrayRef<Value>{lhs, res}, res.getType())
		->getResult(0);
		}

/// Generate a vector implementation for:		/// Generate a vector implementation for:
/// ```		/// ```
/// Op def: ( n, w, c, kw)		/// Op def: ( n, w, c, kw)
/// Iters: ({Par(), Par(), Par(), Red()})		/// Iters: ({Par(), Par(), Par(), Red()})
/// Layout: {{n, strideW * w + dilationW * kw, c}, {kw, c}, {n, w, c}}		/// Layout: {{n, strideW * w + dilationW * kw, c}, {kw, c}, {n, w, c}}
/// ```		/// ```
/// kw is always unrolled.		/// kw is always unrolled.
/// TODO: w (resp. kw) is unrolled when the strideW ( resp. dilationW) is		/// TODO: w (resp. kw) is unrolled when the strideW ( resp. dilationW) is
▲ Show 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	if (!iters({Par(), Par(), Par(), Red(), Red()}))
return rewriter.notifyMatchFailure(		return rewriter.notifyMatchFailure(
op, "failed to match conv::Nwc 3-par 2-red");		op, "failed to match conv::Nwc 3-par 2-red");

// No transposition needed.		// No transposition needed.
if (layout({/lhsIndex/ {n, strideW * w + dilationW * kw, c},		if (layout({/lhsIndex/ {n, strideW * w + dilationW * kw, c},
/rhsIndex/ {kw, c, f},		/rhsIndex/ {kw, c, f},
/resIndex/ {n, w, f}}))		/resIndex/ {n, w, f}}))
return conv(Conv1DOpOrder::Nwc);		return conv(Conv1DOpOrder::Nwc);

return rewriter.notifyMatchFailure(op, "not a conv::Nwc layout");		return rewriter.notifyMatchFailure(op, "not a conv::Nwc layout");
}		}

/// Entry point that transposes into the common form:		/// Entry point that transposes into the common form:
/// {{n, c, strideW * w + dilationW * kw}, {f, c, kw}, {n, f, w}}		/// {{n, c, strideW * w + dilationW * kw}, {f, c, kw}, {n, f, w}}
FailureOr<Operation *> generateNcwConv() {		FailureOr<Operation *> generateNcwConv() {
AffineExpr n, w, f, kw, c;		AffineExpr n, w, f, kw, c;
bindDims(ctx, n, f, w, c, kw);		bindDims(ctx, n, f, w, c, kw);
if (!iters({Par(), Par(), Par(), Red(), Red()}))		if (!iters({Par(), Par(), Par(), Red(), Red()}))
return rewriter.notifyMatchFailure(		return rewriter.notifyMatchFailure(
op, "failed to match conv::Ncw 3-par 2-red");		op, "failed to match conv::Ncw 3-par 2-red");

if (layout({/lhsIndex/ {n, c, strideW * w + dilationW * kw},		if (layout({/lhsIndex/ {n, c, strideW * w + dilationW * kw},
/rhsIndex/ {f, c, kw},		/rhsIndex/ {f, c, kw},
/resIndex/ {n, f, w}}))		/resIndex/ {n, f, w}}))
return conv(Conv1DOpOrder::Ncw);		return conv(Conv1DOpOrder::Ncw);

return rewriter.notifyMatchFailure(op, "not a conv::Ncw layout");		return rewriter.notifyMatchFailure(op, "not a conv::Ncw layout");
}		}

/// Entry point that transposes into the common form:		/// Entry point that transposes into the common form:
		/// {{n, strideW * w + dilationW * kw, c}, {kw}, {n, w, c}} for pooling
		FailureOr<Operation *> generateNwcPooling() {
		AffineExpr n, w, c, kw;
		bindDims(ctx, n, w, c, kw);
		if (!iters({Par(), Par(), Par(), Red()}))
		return rewriter.notifyMatchFailure(op,
		"failed to match pooling 3-par 1-red");

		// No transposition needed.
		if (layout({/lhsIndex/ {n, strideW * w + dilationW * kw, c},
		/rhsIndex/ {kw},
		/resIndex/ {n, w, c}}))
		return conv(Conv1DOpOrder::Nwc);

		return rewriter.notifyMatchFailure(op, "not a pooling::Nwc layout");
		}

		/// Entry point that transposes into the common form:
		/// {{n, c, strideW * w + dilationW * kw}, {kw}, {n, c, w}} for pooling
		FailureOr<Operation *> generateNcwPooling() {
		AffineExpr n, w, c, kw;
		bindDims(ctx, n, c, w, kw);
		if (!iters({Par(), Par(), Par(), Red()}))
		return rewriter.notifyMatchFailure(op,
		"failed to match pooling 3-par 1-red");

		if (layout({/lhsIndex/ {n, c, strideW * w + dilationW * kw},
		/rhsIndex/ {kw},
		/resIndex/ {n, c, w}}))
		return conv(Conv1DOpOrder::Ncw);

		return rewriter.notifyMatchFailure(op, "not a pooling::Ncw layout");
		}

		/// Entry point that transposes into the common form:
/// {{n, strideW * w + dilationW * kw, c}, {kw, c}, {n, w, c}}		/// {{n, strideW * w + dilationW * kw, c}, {kw, c}, {n, w, c}}
FailureOr<Operation *> generateDilatedConv() {		FailureOr<Operation *> generateDilatedConv() {
AffineExpr n, w, c, kw;		AffineExpr n, w, c, kw;
bindDims(ctx, n, w, c, kw);		bindDims(ctx, n, w, c, kw);
if (!iters({Par(), Par(), Par(), Red()}))		if (!iters({Par(), Par(), Par(), Red()}))
return rewriter.notifyMatchFailure(		return rewriter.notifyMatchFailure(
op, "failed to match depthwise::Nwc conv 3-par 1-red");		op, "failed to match depthwise::Nwc conv 3-par 1-red");

// No transposition needed.		// No transposition needed.
if (layout({/lhsIndex/ {n, strideW * w + dilationW * kw, c},		if (layout({/lhsIndex/ {n, strideW * w + dilationW * kw, c},
/rhsIndex/ {kw, c},		/rhsIndex/ {kw, c},
/resIndex/ {n, w, c}}))		/resIndex/ {n, w, c}}))
return depthwiseConv();		return depthwiseConv();

return rewriter.notifyMatchFailure(op, "not a depthwise::Nwc layout");		return rewriter.notifyMatchFailure(op, "not a depthwise::Nwc layout");
}		}

private:		private:
		enum OperKind { Conv, Pool };
		hanchungUnsubmitted Done Reply Inline Actions maybe rename it to `OperType` or `OperKind`? hanchung: maybe rename it to `OperType` or `OperKind`?
bool valid = false;		bool valid = false;
		OperKind oper = Conv;
		StringAttr redOp;
		StringAttr poolExtOp;
		bool isPoolExt = false;
		rsudermanUnsubmitted Done Reply Inline Actions Why are these being handled as globals vs being included a parameters to the builder? rsuderman: Why are these being handled as globals vs being included a parameters to the builder?
		vmuraliAuthorUnsubmitted Done Reply Inline Actions The code works in two steps - the constructor checks for validity of the op and sets these values, and the transformer (`conv`) performs the transformation post construction. To keep the state between the construction and transformation, I need these values. (They are inside a class, so not really global.) vmurali: The code works in two steps - the constructor checks for validity of the op and sets these…
		vmuraliAuthorUnsubmitted Done Reply Inline Actions Another thing to note is that we never needed to specify the cast op or the reduction op for convolution because (a) convolution was always a sum of products, and (b) contraction automatically took care of casting. But for pooling, neither of these is true (as we have multiple kinds of pooling, and we don't use contraction for the equivalent of reduction). Finally, the `isPool` state is simply to distinguish between pooling and convolution. The only state that can potentially be eliminated is `isPoolExt` and rely on null strings for no reduction - I find that distasteful. vmurali: Another thing to note is that we never needed to specify the cast op or the reduction op for…
int strideW, dilationW;		int strideW, dilationW;
Value lhsShaped, rhsShaped, resShaped;		Value lhsShaped, rhsShaped, resShaped;
ShapedType lhsShapedType, rhsShapedType, resShapedType;		ShapedType lhsShapedType, rhsShapedType, resShapedType;

		// Sets oper, poolExtOp and isPoolExt for valid conv/pooling ops.
		// Returns true iff it is a valid conv/pooling op.
		// If (region has 2 ops (reduction + yield) or 3 ops (extension + reduction
		// + yield) and rhs is not used) then it is the body of a pooling
		// If conv, check for single `mul` predecessor. The `mul` operands must be
		// block arguments or extension of block arguments.
		// Otherwise, check for one or zero `ext` predecessor. The `ext` operands
		// must be block arguments or extension of block arguments.
		bool setOperKind(Operation *reduceOp) {
		dcaballeUnsubmitted Done Reply Inline Actions We use `LogicalResult` instead of `bool` for this. dcaballe: We use `LogicalResult` instead of `bool` for this.
		vmuraliAuthorUnsubmitted Done Reply Inline Actions I think it just clutters the code (all the returns based on expressions should be converted into LogicalResult using the constructor, example in line 2409). It's probably needed if it's a transformation pass, but this is a helper utility which has nothing to do with LLVM. vmurali: I think it just clutters the code (all the returns based on expressions should be converted…
		dcaballeUnsubmitted Not Done Reply Inline Actions We only have to s/true/success() and s/false/failure(), right? `LogicalResult` is used in general all over the place to signal if an something succeeded or failed. It's kind of a convention. It provides more context than just a bool, that could be interpreted in different ways... but most importantly, it enforces the caller to check if the result is success or failure and handle both outcomes properly (if the return value if the function is ignored you will get a warning). In general, there are a few benefits and I think this is a common case where it's used. dcaballe: We only have to s/true/success() and s/false/failure(), right? `LogicalResult` is used in…
		int numBlockArguments =
		llvm::count_if(reduceOp->getOperands(),
		[](Value v) { return v.isa<BlockArgument>(); });
		hanchungUnsubmitted Done Reply Inline Actions nit: we usually name it with `numBlockArguments`. hanchung: nit: we usually name it with `numBlockArguments`.
		switch (numBlockArguments) {
		case 1: {
		// Will be convolution if feeder is a MulOp.
		// Otherwise, if it can be pooling.
		auto feedValIt = llvm::find_if(reduceOp->getOperands(), [](Value v) {
		return !v.isa<BlockArgument>();
		});
		Operation feedOp = (feedValIt).getDefiningOp();
		if (isCastOfBlockArgument(feedOp)) {
		oper = Pool;
		isPoolExt = true;
		poolExtOp = feedOp->getName().getIdentifier();
		} else if (!(isa<arith::MulIOp, arith::MulFOp>(feedOp) &&
		llvm::all_of(feedOp->getOperands(), [](Value v) {
		if (v.isa<BlockArgument>())
		return true;
		if (Operation *op = v.getDefiningOp())
		return isCastOfBlockArgument(op);
		return false;
		}))) {
		return false;
		}
		return true;
		}
		case 2:
		// Must be pooling
		oper = Pool;
		isPoolExt = false;
		return true;
		default:
		return false;
		}
		}
};		};
} // namespace		} // namespace

/// Helper function to vectorize a LinalgOp with convolution semantics.		/// Helper function to vectorize a LinalgOp with convolution semantics.
// TODO: extend the generic vectorization to support windows and drop this.		// TODO: extend the generic vectorization to support windows and drop this.
static FailureOr<Operation *> vectorizeConvolution(RewriterBase &rewriter,		static FailureOr<Operation *> vectorizeConvolution(RewriterBase &rewriter,
LinalgOp op) {		LinalgOp op) {
// The ConvolutionOpInterface gives us guarantees of existence for		// The ConvolutionOpInterface gives us guarantees of existence for
// strides/dilations. However, we do not need to rely on those, we can simply		// strides/dilations. However, we do not need to rely on those, we can simply
// use them if present, otherwise use the default and let the generic conv.		// use them if present, otherwise use the default and let the generic conv.
// matcher in the ConvGenerator succeed or fail.		// matcher in the ConvGenerator succeed or fail.
auto strides = op->getAttrOfType<DenseIntElementsAttr>("strides");		auto strides = op->getAttrOfType<DenseIntElementsAttr>("strides");
auto dilations = op->getAttrOfType<DenseIntElementsAttr>("dilations");		auto dilations = op->getAttrOfType<DenseIntElementsAttr>("dilations");
auto stride = strides ? *strides.getValues<uint64_t>().begin() : 1;		auto stride = strides ? *strides.getValues<uint64_t>().begin() : 1;
auto dilation = dilations ? *dilations.getValues<uint64_t>().begin() : 1;		auto dilation = dilations ? *dilations.getValues<uint64_t>().begin() : 1;
Conv1DGenerator e(rewriter, op, stride, dilation);		Conv1DGenerator e(rewriter, op, stride, dilation);
auto res = e.generateNwcConv();		auto res = e.generateNwcConv();
if (succeeded(res))		if (succeeded(res))
return res;		return res;
res = e.generateNcwConv();		res = e.generateNcwConv();
if (succeeded(res))		if (succeeded(res))
return res;		return res;
		res = e.generateNwcPooling();
		if (succeeded(res))
		return res;
		res = e.generateNcwPooling();
		if (succeeded(res))
		return res;
return e.generateDilatedConv();		return e.generateDilatedConv();
}		}

struct VectorizeConvolution : public OpInterfaceRewritePattern<LinalgOp> {		struct VectorizeConvolution : public OpInterfaceRewritePattern<LinalgOp> {
using OpInterfaceRewritePattern::OpInterfaceRewritePattern;		using OpInterfaceRewritePattern::OpInterfaceRewritePattern;

LogicalResult matchAndRewrite(LinalgOp op,		LogicalResult matchAndRewrite(LinalgOp op,
PatternRewriter &rewriter) const override {		PatternRewriter &rewriter) const override {
Show All 18 Lines

mlir/test/Dialect/Linalg/vectorize-convolution.mlir

	Show First 20 Lines • Show All 565 Lines • ▼ Show 20 Lines
	/// Read the whole data in one shot.			/// Read the whole data in one shot.
	// CHECK: %[[V_INPUT_R:.+]] = vector.transfer_read %[[INPUT]][%[[C0]], %[[C0]], %[[C0]]]			// CHECK: %[[V_INPUT_R:.+]] = vector.transfer_read %[[INPUT]][%[[C0]], %[[C0]], %[[C0]]]
	// CHECK: %[[V_FILTER_R:.+]] = vector.transfer_read %[[FILTER]][%[[C0]], %[[C0]], %[[C0]]]			// CHECK: %[[V_FILTER_R:.+]] = vector.transfer_read %[[FILTER]][%[[C0]], %[[C0]], %[[C0]]]
	// CHECK: %[[V_OUTPUT_R:.+]] = vector.transfer_read %[[OUTPUT]][%[[C0]], %[[C0]], %[[C0]]]			// CHECK: %[[V_OUTPUT_R:.+]] = vector.transfer_read %[[OUTPUT]][%[[C0]], %[[C0]], %[[C0]]]
	// CHECK: %[[V_FILTER_1:.+]] = vector.extract %[[V_FILTER_R]][0] : vector<1x3x2xf16>			// CHECK: %[[V_FILTER_1:.+]] = vector.extract %[[V_FILTER_R]][0] : vector<1x3x2xf16>
	// CHECK: %[[CONT:.*]] = vector.contract			// CHECK: %[[CONT:.*]] = vector.contract
	// {{.*}} %[[V_INPUT_R]], %[[V_FILTER_1]], %[[V_OUTPUT_R]] : vector<1x2x3xf16>, vector<3x2xf16> into vector<1x2x2xf32>			// {{.*}} %[[V_INPUT_R]], %[[V_FILTER_1]], %[[V_OUTPUT_R]] : vector<1x2x3xf16>, vector<3x2xf16> into vector<1x2x2xf32>
	// CHECK: vector.transfer_write %[[CONT]], %[[OUTPUT]][%[[C0]], %[[C0]], %[[C0]]]			// CHECK: vector.transfer_write %[[CONT]], %[[OUTPUT]][%[[C0]], %[[C0]], %[[C0]]]

				// -----
				dcaballeUnsubmitted Done Reply Inline Actions Missing `-----`? dcaballe: Missing `-----`?

				func.func @pooling_nwc_sum_memref_1_2_1_3(%input: memref<4x4x3xf32>, %filter: memref<1xf32>, %output: memref<4x2x3xf32>) {
				linalg.pooling_nwc_sum
				{dilations = dense<1> : tensor<1xi64>, strides = dense<3> : tensor<1xi64>}
				ins(%input, %filter : memref<4x4x3xf32>, memref<1xf32>)
				outs(%output : memref<4x2x3xf32>)
				return
				}

				// CHECK-LABEL: func.func @pooling_nwc_sum_memref_1_2_1_3
				dcaballeUnsubmitted Done Reply Inline Actions We usually use `CHECK-LABEL` to match the function name. It can make the test more resilient to accidental checks. dcaballe: We usually use `CHECK-LABEL` to match the function name. It can make the test more resilient to…
				vmuraliAuthorUnsubmitted Done Reply Inline Actions I was basically following the style of the convolution functions. I can change these. vmurali: I was basically following the style of the convolution functions. I can change these.
				// CHECK-SAME: (%[[Varg0:.+]]: memref<4x4x3xf32>, %[[Varg1:.+]]: memref<1xf32>, %[[Varg2:.+]]: memref<4x2x3xf32>)
				hanchungUnsubmitted Not Done Reply Inline Actions why not name them to `INPUT`, `FILTER`, `OUTPUT`? It can match the naming in the input MLIR better and help reviewing the test. I don't know if we have naming style guide for lit test, but we should keep them consistent. All the variables in this file are upper cases with `_`. hanchung: why not name them to `INPUT`, `FILTER`, `OUTPUT`? It can match the naming in the input MLIR…
				vmuraliAuthorUnsubmitted Done Reply Inline Actions These tests were auto constructed from the output after examining the outputs to be correct, and my script doesn't transform the names other than adding a "V". I can fix it manually, but it's probably not very important, given that these tests will fail appropriately. There are too many variations of convolution to write these tests by hand. vmurali: These tests were auto constructed from the output after examining the outputs to be correct…
				// CHECK-DAG: %[[Vc0:.+]] = arith.constant 0 : index
				// CHECK-DAG: %[[Vcst:.+]] = arith.constant 0.000000e+00 : f32
				// CHECK: %[[V0:.+]] = vector.transfer_read %[[Varg0]][%[[Vc0]], %[[Vc0]], %[[Vc0]]], %[[Vcst]] {in_bounds = [true, true, true]} : memref<4x4x3xf32>, vector<4x4x3xf32>
				// CHECK: %[[V1:.+]] = vector.transfer_read %[[Varg2]][%[[Vc0]], %[[Vc0]], %[[Vc0]]], %[[Vcst]] {in_bounds = [true, true, true]} : memref<4x2x3xf32>, vector<4x2x3xf32>
				// CHECK: %[[V2:.+]] = vector.extract_strided_slice %[[V0]] {offsets = [0, 0, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x4x3xf32> to vector<4x1x3xf32>
				// CHECK: %[[V3:.+]] = vector.extract_strided_slice %[[V0]] {offsets = [0, 3, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x4x3xf32> to vector<4x1x3xf32>
				// CHECK: %[[V4:.+]] = vector.extract_strided_slice %[[V1]] {offsets = [0, 0, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x2x3xf32> to vector<4x1x3xf32>
				// CHECK: %[[V5:.+]] = vector.extract_strided_slice %[[V1]] {offsets = [0, 1, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x2x3xf32> to vector<4x1x3xf32>
				// CHECK: %[[V6:.+]] = arith.addf %[[V2]], %[[V4]] : vector<4x1x3xf32>
				// CHECK: %[[V7:.+]] = arith.addf %[[V3]], %[[V5]] : vector<4x1x3xf32>
				// CHECK: %[[V8:.+]] = vector.insert_strided_slice %[[V6]], %[[V1]] {offsets = [0, 0, 0], strides = [1, 1, 1]} : vector<4x1x3xf32> into vector<4x2x3xf32>
				// CHECK: %[[V9:.+]] = vector.insert_strided_slice %[[V7]], %[[V8]] {offsets = [0, 1, 0], strides = [1, 1, 1]} : vector<4x1x3xf32> into vector<4x2x3xf32>
				// CHECK: vector.transfer_write %[[V9]], %[[Varg2]][%[[Vc0]], %[[Vc0]], %[[Vc0]]] {in_bounds = [true, true, true]} : vector<4x2x3xf32>, memref<4x2x3xf32>
				dcaballeUnsubmitted Done Reply Inline Actions Please, use `CHECK-DAG` only when absolutely necessary. It makes the mapping more expensive and can make debugging an actual nightmare. For example, It's common to use DAG for constants, maps, globals but I don't think we need it here for anything else. dcaballe: Please, use `CHECK-DAG` only when absolutely necessary. It makes the mapping more expensive and…

				// -----

				func.func @pooling_nwc_max_memref_1_2_1_3(%input: memref<4x4x3xf32>, %filter: memref<1xf32>, %output: memref<4x2x3xf32>) {
				linalg.pooling_nwc_max
				{dilations = dense<1> : tensor<1xi64>, strides = dense<3> : tensor<1xi64>}
				ins(%input, %filter : memref<4x4x3xf32>, memref<1xf32>)
				outs(%output : memref<4x2x3xf32>)
				return
				}

				// CHECK-LABEL: func.func @pooling_nwc_max_memref_1_2_1_3
				// CHECK-SAME: (%[[Varg0:.+]]: memref<4x4x3xf32>, %[[Varg1:.+]]: memref<1xf32>, %[[Varg2:.+]]: memref<4x2x3xf32>)
				// CHECK-DAG: %[[Vc0:.+]] = arith.constant 0 : index
				// CHECK-DAG: %[[Vcst:.+]] = arith.constant 0.000000e+00 : f32
				// CHECK: %[[V0:.+]] = vector.transfer_read %[[Varg0]][%[[Vc0]], %[[Vc0]], %[[Vc0]]], %[[Vcst]] {in_bounds = [true, true, true]} : memref<4x4x3xf32>, vector<4x4x3xf32>
				// CHECK: %[[V1:.+]] = vector.transfer_read %[[Varg2]][%[[Vc0]], %[[Vc0]], %[[Vc0]]], %[[Vcst]] {in_bounds = [true, true, true]} : memref<4x2x3xf32>, vector<4x2x3xf32>
				// CHECK: %[[V2:.+]] = vector.extract_strided_slice %[[V0]] {offsets = [0, 0, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x4x3xf32> to vector<4x1x3xf32>
				// CHECK: %[[V3:.+]] = vector.extract_strided_slice %[[V0]] {offsets = [0, 3, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x4x3xf32> to vector<4x1x3xf32>
				// CHECK: %[[V4:.+]] = vector.extract_strided_slice %[[V1]] {offsets = [0, 0, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x2x3xf32> to vector<4x1x3xf32>
				// CHECK: %[[V5:.+]] = vector.extract_strided_slice %[[V1]] {offsets = [0, 1, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x2x3xf32> to vector<4x1x3xf32>
				// CHECK: %[[V6:.+]] = arith.maxf %[[V2]], %[[V4]] : vector<4x1x3xf32>
				// CHECK: %[[V7:.+]] = arith.maxf %[[V3]], %[[V5]] : vector<4x1x3xf32>
				// CHECK: %[[V8:.+]] = vector.insert_strided_slice %[[V6]], %[[V1]] {offsets = [0, 0, 0], strides = [1, 1, 1]} : vector<4x1x3xf32> into vector<4x2x3xf32>
				// CHECK: %[[V9:.+]] = vector.insert_strided_slice %[[V7]], %[[V8]] {offsets = [0, 1, 0], strides = [1, 1, 1]} : vector<4x1x3xf32> into vector<4x2x3xf32>
				// CHECK: vector.transfer_write %[[V9]], %[[Varg2]][%[[Vc0]], %[[Vc0]], %[[Vc0]]] {in_bounds = [true, true, true]} : vector<4x2x3xf32>, memref<4x2x3xf32>

				// -----

				// The i8i8i32 case is similar to f32 case, so checking one case is enough for
				// test coverage.
				func.func @pooling_nwc_sum_i8i8i32_memref_1_2_1_3(%input: memref<4x4x3xi8>, %filter: memref<1xi8>, %output: memref<4x2x3xi32>) {
				linalg.pooling_nwc_sum
				{dilations = dense<1> : tensor<1xi64>, strides = dense<3> : tensor<1xi64>}
				ins(%input, %filter : memref<4x4x3xi8>, memref<1xi8>)
				outs(%output : memref<4x2x3xi32>)
				return
				}

				// CHECK-LABEL: func.func @pooling_nwc_sum_i8i8i32_memref_1_2_1_3
				// CHECK-SAME: (%[[Varg0:.+]]: memref<4x4x3xi8>, %[[Varg1:.+]]: memref<1xi8>, %[[Varg2:.+]]: memref<4x2x3xi32>)
				// CHECK-DAG: %[[Vc0:.+]] = arith.constant 0 : index
				// CHECK-DAG: %[[Vc0_i8:.+]] = arith.constant 0 : i8
				// CHECK-DAG: %[[Vc0_i32:.+]] = arith.constant 0 : i32
				// CHECK: %[[V0:.+]] = vector.transfer_read %[[Varg0]][%[[Vc0]], %[[Vc0]], %[[Vc0]]], %[[Vc0_i8]] {in_bounds = [true, true, true]} : memref<4x4x3xi8>, vector<4x4x3xi8>
				// CHECK: %[[V1:.+]] = vector.transfer_read %[[Varg2]][%[[Vc0]], %[[Vc0]], %[[Vc0]]], %[[Vc0_i32]] {in_bounds = [true, true, true]} : memref<4x2x3xi32>, vector<4x2x3xi32>
				// CHECK: %[[V2:.+]] = vector.extract_strided_slice %[[V0]] {offsets = [0, 0, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x4x3xi8> to vector<4x1x3xi8>
				// CHECK: %[[V3:.+]] = vector.extract_strided_slice %[[V0]] {offsets = [0, 3, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x4x3xi8> to vector<4x1x3xi8>
				// CHECK: %[[V4:.+]] = vector.extract_strided_slice %[[V1]] {offsets = [0, 0, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x2x3xi32> to vector<4x1x3xi32>
				// CHECK: %[[V5:.+]] = vector.extract_strided_slice %[[V1]] {offsets = [0, 1, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x2x3xi32> to vector<4x1x3xi32>
				// CHECK: %[[V6:.+]] = arith.extsi %[[V2]] : vector<4x1x3xi8> to vector<4x1x3xi32>
				// CHECK: %[[V7:.+]] = arith.addi %[[V6]], %[[V4]] : vector<4x1x3xi32>
				// CHECK: %[[V8:.+]] = arith.extsi %[[V3]] : vector<4x1x3xi8> to vector<4x1x3xi32>
				// CHECK: %[[V9:.+]] = arith.addi %[[V8]], %[[V5]] : vector<4x1x3xi32>
				// CHECK: %[[V10:.+]] = vector.insert_strided_slice %[[V7]], %[[V1]] {offsets = [0, 0, 0], strides = [1, 1, 1]} : vector<4x1x3xi32> into vector<4x2x3xi32>
				// CHECK: %[[V11:.+]] = vector.insert_strided_slice %[[V9]], %[[V10]] {offsets = [0, 1, 0], strides = [1, 1, 1]} : vector<4x1x3xi32> into vector<4x2x3xi32>
				// CHECK: vector.transfer_write %[[V11]], %[[Varg2]][%[[Vc0]], %[[Vc0]], %[[Vc0]]] {in_bounds = [true, true, true]} : vector<4x2x3xi32>, memref<4x2x3xi32>
				// CHECK: return

				// -----

				// The i8i8i32 case is similar to f32 case, so checking one case is enough for
				// test coverage.
				func.func @pooling_nwc_max_i8i8i32_memref_1_2_1_3(%input: memref<4x4x3xi8>, %filter: memref<1xi8>, %output: memref<4x2x3xi32>) {
				linalg.pooling_nwc_max
				{dilations = dense<1> : tensor<1xi64>, strides = dense<3> : tensor<1xi64>}
				ins(%input, %filter : memref<4x4x3xi8>, memref<1xi8>)
				outs(%output : memref<4x2x3xi32>)
				return
				}

				// CHECK-LABEL: func.func @pooling_nwc_max_i8i8i32_memref_1_2_1_3
				// CHECK-SAME: (%[[Varg0:.+]]: memref<4x4x3xi8>, %[[Varg1:.+]]: memref<1xi8>, %[[Varg2:.+]]: memref<4x2x3xi32>)
				// CHECK-DAG: %[[Vc0:.+]] = arith.constant 0 : index
				// CHECK-DAG: %[[Vc0_i8:.+]] = arith.constant 0 : i8
				// CHECK-DAG: %[[Vc0_i32:.+]] = arith.constant 0 : i32
				// CHECK: %[[V0:.+]] = vector.transfer_read %[[Varg0]][%[[Vc0]], %[[Vc0]], %[[Vc0]]], %[[Vc0_i8]] {in_bounds = [true, true, true]} : memref<4x4x3xi8>, vector<4x4x3xi8>
				// CHECK: %[[V1:.+]] = vector.transfer_read %[[Varg2]][%[[Vc0]], %[[Vc0]], %[[Vc0]]], %[[Vc0_i32]] {in_bounds = [true, true, true]} : memref<4x2x3xi32>, vector<4x2x3xi32>
				// CHECK: %[[V2:.+]] = vector.extract_strided_slice %[[V0]] {offsets = [0, 0, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x4x3xi8> to vector<4x1x3xi8>
				// CHECK: %[[V3:.+]] = vector.extract_strided_slice %[[V0]] {offsets = [0, 3, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x4x3xi8> to vector<4x1x3xi8>
				// CHECK: %[[V4:.+]] = vector.extract_strided_slice %[[V1]] {offsets = [0, 0, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x2x3xi32> to vector<4x1x3xi32>
				// CHECK: %[[V5:.+]] = vector.extract_strided_slice %[[V1]] {offsets = [0, 1, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x2x3xi32> to vector<4x1x3xi32>
				// CHECK: %[[V6:.+]] = arith.extsi %[[V2]] : vector<4x1x3xi8> to vector<4x1x3xi32>
				// CHECK: %[[V7:.+]] = arith.maxsi %[[V6]], %[[V4]] : vector<4x1x3xi32>
				// CHECK: %[[V8:.+]] = arith.extsi %[[V3]] : vector<4x1x3xi8> to vector<4x1x3xi32>
				// CHECK: %[[V9:.+]] = arith.maxsi %[[V8]], %[[V5]] : vector<4x1x3xi32>
				// CHECK: %[[V10:.+]] = vector.insert_strided_slice %[[V7]], %[[V1]] {offsets = [0, 0, 0], strides = [1, 1, 1]} : vector<4x1x3xi32> into vector<4x2x3xi32>
				// CHECK: %[[V11:.+]] = vector.insert_strided_slice %[[V9]], %[[V10]] {offsets = [0, 1, 0], strides = [1, 1, 1]} : vector<4x1x3xi32> into vector<4x2x3xi32>
				// CHECK: vector.transfer_write %[[V11]], %[[Varg2]][%[[Vc0]], %[[Vc0]], %[[Vc0]]] {in_bounds = [true, true, true]} : vector<4x2x3xi32>, memref<4x2x3xi32>
				// CHECK: return

				// -----

				func.func @pooling_nwc_sum_memref_2_2_2_3(%input: memref<4x6x3xf32>, %filter: memref<2xf32>, %output: memref<4x2x3xf32>) {
				linalg.pooling_nwc_sum
				{dilations = dense<2> : tensor<1xi64>, strides = dense<3> : tensor<1xi64>}
				ins(%input, %filter : memref<4x6x3xf32>, memref<2xf32>)
				outs(%output : memref<4x2x3xf32>)
				return
				}

				// CHECK-LABEL: func.func @pooling_nwc_sum_memref_2_2_2_3
				// CHECK-SAME: (%[[Varg0:.+]]: memref<4x6x3xf32>, %[[Varg1:.+]]: memref<2xf32>, %[[Varg2:.+]]: memref<4x2x3xf32>)
				// CHECK-DAG: %[[Vc0:.+]] = arith.constant 0 : index
				// CHECK-DAG: %[[Vcst:.+]] = arith.constant 0.000000e+00 : f32
				// CHECK: %[[V0:.+]] = vector.transfer_read %[[Varg0]][%[[Vc0]], %[[Vc0]], %[[Vc0]]], %[[Vcst]] {in_bounds = [true, true, true]} : memref<4x6x3xf32>, vector<4x6x3xf32>
				// CHECK: %[[V1:.+]] = vector.transfer_read %[[Varg2]][%[[Vc0]], %[[Vc0]], %[[Vc0]]], %[[Vcst]] {in_bounds = [true, true, true]} : memref<4x2x3xf32>, vector<4x2x3xf32>
				// CHECK: %[[V2:.+]] = vector.extract_strided_slice %[[V0]] {offsets = [0, 0, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x6x3xf32> to vector<4x1x3xf32>
				// CHECK: %[[V3:.+]] = vector.extract_strided_slice %[[V0]] {offsets = [0, 3, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x6x3xf32> to vector<4x1x3xf32>
				// CHECK: %[[V4:.+]] = vector.extract_strided_slice %[[V0]] {offsets = [0, 2, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x6x3xf32> to vector<4x1x3xf32>
				// CHECK: %[[V5:.+]] = vector.extract_strided_slice %[[V0]] {offsets = [0, 5, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x6x3xf32> to vector<4x1x3xf32>
				// CHECK: %[[V6:.+]] = vector.extract_strided_slice %[[V1]] {offsets = [0, 0, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x2x3xf32> to vector<4x1x3xf32>
				// CHECK: %[[V7:.+]] = vector.extract_strided_slice %[[V1]] {offsets = [0, 1, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x2x3xf32> to vector<4x1x3xf32>
				// CHECK: %[[V8:.+]] = arith.addf %[[V2]], %[[V6]] : vector<4x1x3xf32>
				// CHECK: %[[V9:.+]] = arith.addf %[[V3]], %[[V7]] : vector<4x1x3xf32>
				// CHECK: %[[V10:.+]] = arith.addf %[[V4]], %[[V8]] : vector<4x1x3xf32>
				// CHECK: %[[V11:.+]] = arith.addf %[[V5]], %[[V9]] : vector<4x1x3xf32>
				// CHECK: %[[V12:.+]] = vector.insert_strided_slice %[[V10]], %[[V1]] {offsets = [0, 0, 0], strides = [1, 1, 1]} : vector<4x1x3xf32> into vector<4x2x3xf32>
				// CHECK: %[[V13:.+]] = vector.insert_strided_slice %[[V11]], %[[V12]] {offsets = [0, 1, 0], strides = [1, 1, 1]} : vector<4x1x3xf32> into vector<4x2x3xf32>
				// CHECK: vector.transfer_write %[[V13:.+]], %[[Varg2]][%[[Vc0]], %[[Vc0]], %[[Vc0]]] {in_bounds = [true, true, true]} : vector<4x2x3xf32>, memref<4x2x3xf32>


				// -----

				func.func @pooling_ncw_sum_memref_1_2_1_3(%input: memref<4x3x4xf32>, %filter: memref<1xf32>, %output: memref<4x3x2xf32>) {
				linalg.pooling_ncw_sum
				{dilations = dense<1> : tensor<1xi64>, strides = dense<3> : tensor<1xi64>}
				ins(%input, %filter : memref<4x3x4xf32>, memref<1xf32>)
				outs(%output : memref<4x3x2xf32>)
				return
				}

				// CHECK-LABEL: func.func @pooling_ncw_sum_memref_1_2_1_3
				// CHECK-SAME: (%[[Varg0:.+]]: memref<4x3x4xf32>, %[[Varg1:.+]]: memref<1xf32>, %[[Varg2:.+]]: memref<4x3x2xf32>)
				// CHECK-DAG: %[[Vc0:.+]] = arith.constant 0 : index
				// CHECK-DAG: %[[Vcst:.+]] = arith.constant 0.000000e+00 : f32
				// CHECK: %[[V0:.+]] = vector.transfer_read %[[Varg0]][%[[Vc0]], %[[Vc0]], %[[Vc0]]], %[[Vcst]] {in_bounds = [true, true, true]} : memref<4x3x4xf32>, vector<4x3x4xf32>
				// CHECK: %[[V1:.+]] = vector.transfer_read %[[Varg2]][%[[Vc0]], %[[Vc0]], %[[Vc0]]], %[[Vcst]] {in_bounds = [true, true, true]} : memref<4x3x2xf32>, vector<4x3x2xf32>
				// CHECK: %[[V2:.+]] = vector.transpose %[[V0]], [0, 2, 1] : vector<4x3x4xf32> to vector<4x4x3xf32>
				// CHECK: %[[V3:.+]] = vector.transpose %[[V1]], [0, 2, 1] : vector<4x3x2xf32> to vector<4x2x3xf32>
				// CHECK: %[[V4:.+]] = vector.extract_strided_slice %[[V2]] {offsets = [0, 0, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x4x3xf32> to vector<4x1x3xf32>
				// CHECK: %[[V5:.+]] = vector.extract_strided_slice %[[V2]] {offsets = [0, 3, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x4x3xf32> to vector<4x1x3xf32>
				// CHECK: %[[V6:.+]] = vector.extract_strided_slice %[[V3]] {offsets = [0, 0, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x2x3xf32> to vector<4x1x3xf32>
				// CHECK: %[[V7:.+]] = vector.extract_strided_slice %[[V3]] {offsets = [0, 1, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x2x3xf32> to vector<4x1x3xf32>
				// CHECK: %[[V8:.+]] = arith.addf %[[V4]], %[[V6]] : vector<4x1x3xf32>
				// CHECK: %[[V9:.+]] = arith.addf %[[V5]], %[[V7]] : vector<4x1x3xf32>
				// CHECK: %[[V10:.+]] = vector.insert_strided_slice %[[V8]], %[[V3]] {offsets = [0, 0, 0], strides = [1, 1, 1]} : vector<4x1x3xf32> into vector<4x2x3xf32>
				// CHECK: %[[V11:.+]] = vector.insert_strided_slice %[[V9]], %[[V10]] {offsets = [0, 1, 0], strides = [1, 1, 1]} : vector<4x1x3xf32> into vector<4x2x3xf32>
				// CHECK: %[[V12:.+]] = vector.transpose %[[V11]], [0, 2, 1] : vector<4x2x3xf32> to vector<4x3x2xf32>
				// CHECK: vector.transfer_write %[[V12:.+]], %[[Varg2]][%[[Vc0]], %[[Vc0]], %[[Vc0]]] {in_bounds = [true, true, true]} : vector<4x3x2xf32>, memref<4x3x2xf32>


				// -----

				func.func @pooling_nwc_sum_mixed_type_memref_1_2_1_1(%input: memref<1x2x3xf16>, %filter: memref<1xf16>, %output: memref<1x2x3xf32>) {
				linalg.pooling_nwc_sum
				{dilations = dense<1> : vector<1xi64>, strides = dense<1> : vector<1xi64>}
				ins(%input, %filter : memref<1x2x3xf16>, memref<1xf16>)
				outs(%output : memref<1x2x3xf32>)
				return
				}

				// CHECK-LABEL: func.func @pooling_nwc_sum_mixed_type_memref_1_2_1_1
				// CHECK-SAME: (%[[Varg0:.+]]: memref<1x2x3xf16>, %[[Varg1:.+]]: memref<1xf16>, %[[Varg2:.+]]: memref<1x2x3xf32>)
				// CHECK-DAG: %[[Vc0:.+]] = arith.constant 0 : index
				// CHECK-DAG: %[[Vcst:.+]] = arith.constant 0.000000e+00 : f16
				// CHECK-DAG: %[[Vcst_0:.+]] = arith.constant 0.000000e+00 : f32
				// CHECK: %[[V0:.+]] = vector.transfer_read %[[Varg0]][%[[Vc0]], %[[Vc0]], %[[Vc0]]], %[[Vcst]] {in_bounds = [true, true, true]} : memref<1x2x3xf16>, vector<1x2x3xf16>
				// CHECK: %[[V1:.+]] = vector.transfer_read %[[Varg2]][%[[Vc0]], %[[Vc0]], %[[Vc0]]], %[[Vcst_0]] {in_bounds = [true, true, true]} : memref<1x2x3xf32>, vector<1x2x3xf32>
				// CHECK: %[[V2:.+]] = arith.extf %[[V0]] : vector<1x2x3xf16> to vector<1x2x3xf32>
				// CHECK: %[[V3:.+]] = arith.addf %[[V2]], %[[V1]] : vector<1x2x3xf32>
				// CHECK: vector.transfer_write %[[V3:.+]], %[[Varg2]][%[[Vc0]], %[[Vc0]], %[[Vc0]]] {in_bounds = [true, true, true]} : vector<1x2x3xf32>, memref<1x2x3xf32>

				// -----

				func.func @pooling_nwc_sum_memref_2_2_2_1(%input: memref<4x4x3xf32>, %filter: memref<2xf32>, %output: memref<4x2x3xf32>) {
				linalg.pooling_nwc_sum
				{dilations = dense<2> : tensor<1xi64>, strides = dense<1> : tensor<1xi64>}
				ins(%input, %filter : memref<4x4x3xf32>, memref<2xf32>)
				outs(%output : memref<4x2x3xf32>)
				return
				}

				// CHECK-LABEL: func.func @pooling_nwc_sum_memref_2_2_2_1
				// CHECK-SAME: (%[[Varg0:.+]]: memref<4x4x3xf32>, %[[Varg1:.+]]: memref<2xf32>, %[[Varg2:.+]]: memref<4x2x3xf32>)
				// CHECK-DAG: %[[Vc0:.+]] = arith.constant 0 : index
				// CHECK-DAG: %[[Vcst:.+]] = arith.constant 0.000000e+00 : f32
				// CHECK: %[[V0:.+]] = vector.transfer_read %[[Varg0]][%[[Vc0]], %[[Vc0]], %[[Vc0]]], %[[Vcst]] {in_bounds = [true, true, true]} : memref<4x4x3xf32>, vector<4x4x3xf32>
				// CHECK: %[[V1:.+]] = vector.transfer_read %[[Varg2]][%[[Vc0]], %[[Vc0]], %[[Vc0]]], %[[Vcst]] {in_bounds = [true, true, true]} : memref<4x2x3xf32>, vector<4x2x3xf32>
				// CHECK: %[[V2:.+]] = vector.extract_strided_slice %[[V0]] {offsets = [0, 0, 0], sizes = [4, 2, 3], strides = [1, 1, 1]} : vector<4x4x3xf32> to vector<4x2x3xf32>
				// CHECK: %[[V3:.+]] = vector.extract_strided_slice %[[V0]] {offsets = [0, 2, 0], sizes = [4, 2, 3], strides = [1, 1, 1]} : vector<4x4x3xf32> to vector<4x2x3xf32>
				// CHECK: %[[V4:.+]] = arith.addf %[[V2]], %[[V1]] : vector<4x2x3xf32>
				// CHECK: %[[V5:.+]] = arith.addf %[[V3]], %[[V4]] : vector<4x2x3xf32>
				// CHECK: vector.transfer_write %[[V5:.+]], %[[Varg2]][%[[Vc0]], %[[Vc0]], %[[Vc0]]] {in_bounds = [true, true, true]} : vector<4x2x3xf32>, memref<4x2x3xf32>


				// -----

				func.func @pooling_ncw_sum_memref_2_2_2_3(%input: memref<4x3x6xf32>, %filter: memref<2xf32>, %output: memref<4x3x2xf32>) {
				linalg.pooling_ncw_sum
				{dilations = dense<2> : tensor<1xi64>, strides = dense<3> : tensor<1xi64>}
				ins(%input, %filter : memref<4x3x6xf32>, memref<2xf32>)
				outs(%output : memref<4x3x2xf32>)
				return
				}

				// CHECK-LABEL: func.func @pooling_ncw_sum_memref_2_2_2_3
				// CHECK-SAME: (%[[Varg0:.+]]: memref<4x3x6xf32>, %[[Varg1:.+]]: memref<2xf32>, %[[Varg2:.+]]: memref<4x3x2xf32>)
				// CHECK-DAG: %[[Vc0:.+]] = arith.constant 0 : index
				// CHECK-DAG: %[[Vcst:.+]] = arith.constant 0.000000e+00 : f32
				// CHECK: %[[V0:.+]] = vector.transfer_read %[[Varg0]][%[[Vc0]], %[[Vc0]], %[[Vc0]]], %[[Vcst]] {in_bounds = [true, true, true]} : memref<4x3x6xf32>, vector<4x3x6xf32>
				// CHECK: %[[V1:.+]] = vector.transfer_read %[[Varg2]][%[[Vc0]], %[[Vc0]], %[[Vc0]]], %[[Vcst]] {in_bounds = [true, true, true]} : memref<4x3x2xf32>, vector<4x3x2xf32>
				// CHECK: %[[V2:.+]] = vector.transpose %[[V0]], [0, 2, 1] : vector<4x3x6xf32> to vector<4x6x3xf32>
				// CHECK: %[[V3:.+]] = vector.transpose %[[V1]], [0, 2, 1] : vector<4x3x2xf32> to vector<4x2x3xf32>
				// CHECK: %[[V4:.+]] = vector.extract_strided_slice %[[V2]] {offsets = [0, 0, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x6x3xf32> to vector<4x1x3xf32>
				// CHECK: %[[V5:.+]] = vector.extract_strided_slice %[[V2]] {offsets = [0, 3, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x6x3xf32> to vector<4x1x3xf32>
				// CHECK: %[[V6:.+]] = vector.extract_strided_slice %[[V2]] {offsets = [0, 2, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x6x3xf32> to vector<4x1x3xf32>
				// CHECK: %[[V7:.+]] = vector.extract_strided_slice %[[V2]] {offsets = [0, 5, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x6x3xf32> to vector<4x1x3xf32>
				// CHECK: %[[V8:.+]] = vector.extract_strided_slice %[[V3]] {offsets = [0, 0, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x2x3xf32> to vector<4x1x3xf32>
				// CHECK: %[[V9:.+]] = vector.extract_strided_slice %[[V3]] {offsets = [0, 1, 0], sizes = [4, 1, 3], strides = [1, 1, 1]} : vector<4x2x3xf32> to vector<4x1x3xf32>
				// CHECK: %[[V10:.+]] = arith.addf %[[V4]], %[[V8]] : vector<4x1x3xf32>
				// CHECK: %[[V11:.+]] = arith.addf %[[V5]], %[[V9]] : vector<4x1x3xf32>
				// CHECK: %[[V12:.+]] = arith.addf %[[V6]], %[[V10]] : vector<4x1x3xf32>
				// CHECK: %[[V13:.+]] = arith.addf %[[V7]], %[[V11]] : vector<4x1x3xf32>
				// CHECK: %[[V14:.+]] = vector.insert_strided_slice %[[V12]], %[[V3]] {offsets = [0, 0, 0], strides = [1, 1, 1]} : vector<4x1x3xf32> into vector<4x2x3xf32>
				// CHECK: %[[V15:.+]] = vector.insert_strided_slice %[[V13]], %[[V14]] {offsets = [0, 1, 0], strides = [1, 1, 1]} : vector<4x1x3xf32> into vector<4x2x3xf32>
				// CHECK: %[[V16:.+]] = vector.transpose %[[V15]], [0, 2, 1] : vector<4x2x3xf32> to vector<4x3x2xf32>
				// CHECK: vector.transfer_write %[[V16:.+]], %[[Varg2]][%[[Vc0]], %[[Vc0]], %[[Vc0]]] {in_bounds = [true, true, true]} : vector<4x3x2xf32>, memref<4x3x2xf32>

				// -----

				func.func @pooling_ncw_sum_memref_2_3_2_1(%input: memref<4x2x5xf32>, %filter: memref<2xf32>, %output: memref<4x2x3xf32>) {
				linalg.pooling_ncw_sum
				{dilations = dense<2> : tensor<1xi64>, strides = dense<1> : tensor<1xi64>}
				ins(%input, %filter : memref<4x2x5xf32>, memref<2xf32>)
				outs(%output : memref<4x2x3xf32>)
				return
				}

				// CHECK-LABEL: func.func @pooling_ncw_sum_memref_2_3_2_1
				// CHECK-SAME: (%[[Varg0:.+]]: memref<4x2x5xf32>, %[[Varg1:.+]]: memref<2xf32>, %[[Varg2:.+]]: memref<4x2x3xf32>)
				// CHECK-DAG: %[[Vc0:.+]] = arith.constant 0 : index
				// CHECK-DAG: %[[Vcst:.+]] = arith.constant 0.000000e+00 : f32
				// CHECK: %[[V0:.+]] = vector.transfer_read %[[Varg0]][%[[Vc0]], %[[Vc0]], %[[Vc0]]], %[[Vcst]] {in_bounds = [true, true, true]} : memref<4x2x5xf32>, vector<4x2x5xf32>
				// CHECK: %[[V1:.+]] = vector.transfer_read %[[Varg2]][%[[Vc0]], %[[Vc0]], %[[Vc0]]], %[[Vcst]] {in_bounds = [true, true, true]} : memref<4x2x3xf32>, vector<4x2x3xf32>
				// CHECK: %[[V2:.+]] = vector.transpose %[[V0]], [0, 2, 1] : vector<4x2x5xf32> to vector<4x5x2xf32>
				// CHECK: %[[V3:.+]] = vector.transpose %[[V1]], [0, 2, 1] : vector<4x2x3xf32> to vector<4x3x2xf32>
				// CHECK: %[[V4:.+]] = vector.extract_strided_slice %[[V2]] {offsets = [0, 0, 0], sizes = [4, 3, 2], strides = [1, 1, 1]} : vector<4x5x2xf32> to vector<4x3x2xf32>
				// CHECK: %[[V5:.+]] = vector.extract_strided_slice %[[V2]] {offsets = [0, 2, 0], sizes = [4, 3, 2], strides = [1, 1, 1]} : vector<4x5x2xf32> to vector<4x3x2xf32>
				// CHECK: %[[V6:.+]] = arith.addf %[[V4]], %[[V3]] : vector<4x3x2xf32>
				// CHECK: %[[V7:.+]] = arith.addf %[[V5]], %[[V6]] : vector<4x3x2xf32>
				// CHECK: %[[V8:.+]] = vector.transpose %[[V7]], [0, 2, 1] : vector<4x3x2xf32> to vector<4x2x3xf32>
				// CHECK: vector.transfer_write %[[V8:.+]], %[[Varg2]][%[[Vc0]], %[[Vc0]], %[[Vc0]]] {in_bounds = [true, true, true]} : vector<4x2x3xf32>, memref<4x2x3xf32>

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][linalg] Vectorize 1D pooling opsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 486691

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

mlir/test/Dialect/Linalg/vectorize-convolution.mlir

[mlir][linalg] Vectorize 1D pooling ops
ClosedPublic