This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/Linalg/IR/
-
Dialect/
-
Linalg/
-
IR/
-
LinalgOps.cpp
-
test/
-
Conversion/
-
TensorToLinalg/
-
tensor-ops-to-linalg.mlir
-
TosaToLinalg/
-
tosa-to-linalg-named.mlir
-
Dialect/
-
Bufferization/Transforms/
-
Transforms/
-
one-shot-bufferize-analysis-empty-tensor-elimination.mlir
-
one-shot-bufferize-empty-tensor-elimination.mlir
-
one-shot-bufferize-partial.mlir
-
one-shot-module-bufferize-allow-return-allocs.mlir
-
one-shot-module-bufferize-analysis.mlir
-
one-shot-module-bufferize-invalid.mlir
-
one-shot-module-bufferize.mlir
-
transform-ops.mlir
-
GPU/
-
transform-gpu-failing.mlir
-
LLVM/
-
transform-e2e.mlir
-
Linalg/
-
affine.mlir
-
bubble-up-extract-slice-op.mlir
-
bufferize.mlir
-
canonicalize.mlir
-
drop-unit-extent-dims.mlir
-
erase-unused-operands-and-results.mlir
-
fusion-elementwise-ops.mlir
-
generalize-named-ops.mlir
-
generalize-named-polymorphic-ops.mlir
-
generalize-pad-tensor.mlir
-
invalid.mlir
-
named-ops.mlir
-
namedop_conversion.mlir
-
one-shot-bufferize-analysis-2fill-extract-matmul-all-perms.mlir
-
one-shot-bufferize.mlir
-
reshape_control_fusion.mlir
-
resolve-shaped-type-result-dims.mlir
-
roundtrip.mlir
-
swap-extract-slice-with-fill.mlir
-
tile-and-fuse-tensors.mlir
-
tile-tensors.mlir
-
tile-to-foreach-thread.mlir
-
transform-op-decompose.mlir
-
transform-op-fuse-into-containing.mlir
-
transform-op-fuse.mlir
-
transform-op-generalize.mlir
-
transform-op-interchange.mlir
-
transform-op-multitile-sizes.mlir
-
transform-op-pad.mlir
-
transform-op-scalarize.mlir
-
transform-op-split-reduction-by-scaling.mlir
-
transform-op-split-reduction.mlir
-
transform-op-tile.mlir
-
transform-op-vectorize.mlir
-
transform-tile-and-fuse.mlir
-
transform-tile-reduction.mlir
-
vectorization.mlir
-
SCF/
-
one-shot-bufferize-analysis.mlir
-
one-shot-bufferize.mlir
-
SparseTensor/
-
sparse_expand.mlir
-
sparse_fill_zero.mlir
-
sparse_kernels.mlir
-
sparse_matmul_codegen.mlir
-
Tensor/
-
one-shot-bufferize.mlir
-
Transform/
-
selective-targeting.mlir
-
Vector/
-
transform-vector.mlir
-
Integration/Dialect/
-
Dialect/
-
Linalg/CPU/
-
CPU/
-
test-one-shot-bufferize.mlir
-
test-tensor-matmul.mlir
-
SparseTensor/CPU/
-
CPU/
-
sparse_conv_1d_nwc_wcf.mlir
-
sparse_conv_2d.mlir
-
sparse_conv_2d_nhwc_hwcf.mlir
-
sparse_conv_3d.mlir
-
sparse_conv_3d_ndhwc_dhwcf.mlir
-
sparse_dot.mlir
-
sparse_expand.mlir
-
sparse_filter_conv2d.mlir
-
sparse_matmul.mlir
-
sparse_quantized_matmul.mlir
-
Interfaces/TilingInterface/
-
TilingInterface/
-
tile-and-fuse-using-interface.mlir
-
tile-using-interface.mlir
-
python/dialects/linalg/
-
dialects/
-
linalg/
-
ops.py

Differential D141804

[mlir][linalg] Omit printing result types for named ops.
Needs RevisionPublic

Authored by pifon2a on Jan 15 2023, 2:57 PM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
springerm
herhut
aartbik
ftynse
ThomasRaoux
dcaballe
mravishankar
hanchung

Summary

Result types for the named ops can be deduced from the list of inits/outs.

This affects only named ops generated from the YAML file, i.e. linalg.matmul, linalg.fill and the likes. After this change all Linalg ops, except for linalg.generic, will have the following syntax:

opname ins(list_of_input_args: list_of_input_types) outs(list_of_init_args: list_of_init_types).

The regex that was used to fix most of the tests: s/ outs($.*$) -> .*$/ outs(\1)/g.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

pifon2a created this revision.Jan 15 2023, 2:57 PM

Herald added a reviewer: aartbik. · View Herald TranscriptJan 15 2023, 2:57 PM

Herald added a reviewer: ftynse. · View Herald Transcript

Herald added a reviewer: aartbik. · View Herald Transcript

Herald added a reviewer: ThomasRaoux. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: hanchung, jsetoain, Moerafaat and 30 others. · View Herald Transcript

pifon2a requested review of this revision.Jan 15 2023, 2:57 PM

Herald added a reviewer: dcaballe. · View Herald TranscriptJan 15 2023, 2:57 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: • pcwang-thead, stephenneuendorffer. · View Herald Transcript

Harbormaster completed remote builds in B207939: Diff 489397.Jan 15 2023, 3:30 PM

This is an unnecessary change that just causes downstream churn. Not sure it is worth it

This revision now requires changes to proceed.Jan 15 2023, 3:52 PM

In D141804#4055148, @mravishankar wrote:

This is an unnecessary change that just causes downstream churn. Not sure it is worth it

It makes IR shorter and nicer. It is not a lot of churn, the named ops are not as widely used as linalg.generic. 90% of the tests were fixed by a simple sed command with s/ outs($.*$) -> .*$/ outs(\1)/g.

The "churn" aspect is subjective. There were patches like https://github.com/llvm/llvm-project/commit/abc362a1077b9cb4186e3e53a616589c7fed4387, that were not blocked.

In D141804#4055864, @pifon2a wrote:

In D141804#4055148, @mravishankar wrote:

This is an unnecessary change that just causes downstream churn. Not sure it is worth it

It makes IR shorter and nicer. It is not a lot of churn, the named ops are not as widely used as linalg.generic. 90% of the tests were fixed by a simple sed command with s/ outs($.*$) -> .*$/ outs(\1)/g.

The "churn" aspect is subjective. There were patches like https://github.com/llvm/llvm-project/commit/abc362a1077b9cb4186e3e53a616589c7fed4387, that were not blocked.

Some other similar patches were blocked (I dont have the link handy). Apart from ergonomics, does it fix anything else. If so, I'd rather ignore ergonomics aspect of this and drop this patch.

I think this is worth cleaning this up. Small things like these can make it look like linalg is a second-class dialect where we don't care about code quality.

In D141804#4057066, @springerm wrote:

I think this is worth cleaning this up. Small things like these can make it look like linalg is a second-class dialect where we don't care about code quality.

Disagree with this characterization. I have explicitly heard from other downstream users (so not IREE/Google) that thing they would prefer is stability in IR format and API. Breaking changes is what causes people headaches. Also the explicit return type specification in IR makes it more explicit that a value of that type is being created by this operation. The implicit tying is more confusing to someone new to MLIR IMO.

I am in favor of landing this (but please ensure consensus before doing so). I have argued against changes before that were purely aesthetic but introduced a lot of churn (like the one in https://reviews.llvm.org/D133076). Ultimately it is a subjective decision. I have a different opinion in this case because the linalg dialect has a smaller usage scope compared to arith, so I am willing to accept more churn. Also, this change improves ergonomics when reading linalg IR, which we increasingly do and hence care about. This was also one of the motivations to introduce new operations to linalg like map.

The cost for these kind of changes will only grow, so I feel like we should use our chance to clean things up while we can.

Please change the description of the PR to clarify that it affects the linalg dialect and add instructions how to fix tests to the PR description. That way people do not have to read the review to know what needs to be done.

pifon2a retitled this revision from [mlir] Omit printing result types for named ops. to [mlir][linalg] Omit printing result types for named ops..Jan 16 2023, 2:29 PM

pifon2a edited the summary of this revision. (Show Details)

Herald added a reviewer: hanchung. · View Herald TranscriptJan 16 2023, 2:29 PM

Herald added a subscriber: limo1996. · View Herald Transcript

pifon2a edited the summary of this revision. (Show Details)Jan 16 2023, 2:29 PM

In D141804#4057169, @herhut wrote:

I am in favor of landing this (but please ensure consensus before doing so). I have argued against changes before that were purely aesthetic but introduced a lot of churn (like the one in https://reviews.llvm.org/D133076). Ultimately it is a subjective decision. I have a different opinion in this case because the linalg dialect has a smaller usage scope compared to arith, so I am willing to accept more churn. Also, this change improves ergonomics when reading linalg IR, which we increasingly do and hence care about. This was also one of the motivations to introduce new operations to linalg like map.

I dont see any reason why linalg ops need to be readable. Also readable means different things to different people. To me this change actually makes it less readable. Before it is clear what the result type is. With the change, you have to know the implicit type coupling between outs operands and results to understand.

W.R.T ops like map, reduce, etc. I am not sure the cost of all the named op definitions, parser/printer (and all subsequent changes that made changes to parser/printer of those ops) are really worth it. I would still highly prefer those ops be removed.

The cost for these kind of changes will only grow, so I feel like we should use our chance to clean things up while we can.

Please change the description of the PR to clarify that it affects the linalg dialect and add instructions how to fix tests to the PR description. That way people do not have to read the review to know what needs to be done.

As a meta-point: I'm concerned about the push-back on improving things based on churn. The comment about users wanting stability can only be an argument about spending more time on design and being more thorough before adding anything to the codebase. However that means much less velocity and higher cost to experiment. So far this isn't the tradeoff we've been making in MLIR I believe.

In D141804#4057397, @mravishankar wrote:

I dont see any reason why linalg ops need to be readable.

Can you elaborate? I'm surprised because taken at face value, this comment is quite against the concept of custom printer in MLIR themselves!

Also readable means different things to different people. To me this change actually makes it less readable. Before it is clear what the result type is. With the change, you have to know the implicit type coupling between outs operands and results to understand.

Eliding return types when they are coupled is quite the norm I believe in upstream dialects, what makes it different in your view here?

Why is linalg.generic different? If this is going to changed later it would be nice to batch those changes together since breaking compatibility downstream has a cost (as discussed)

@ThomasRaoux I would really like to do that, but linalg.generic allows the mixed case, when you have memrefs and tensors at the same time and some folks are using it for some reason.

In D141804#4058170, @pifon2a wrote:

@ThomasRaoux I would really like to do that, but linalg.generic allows the mixed case, when you have memrefs and tensors at the same time and some folks are using it for some reason.

Interesting, and in this case we cannot infer the result type based on operands? Isn’t the result type always matching the outs tensors?

I think Mehdi has a good point about spending more time on design. I agree clean ups are nice, however I feel like the incremental breaking changes are causing the problem and we shouldn’t take those lightly. I believe there has been several syntax changes proposed to linalg ops in the recent past? Should we make sure we have a syntax that won’t need more change in the near future before make those changes? (Maybe it is worth a quick chat on discourse to make sure we address all the potential syntax problems at once and not continuously break users). What do you think?

From my experience integrating MLIR into Google's internal repository, breakages from changes like these seem to be very frequent (e.g., the two already mentioned above), so I don't see why this change should be blocked while others get a pass. If we want to be more careful about API breakages, then we should first formalize this somehow and apply the same rules to everyone.
I never considered the printed version of MLIR stable, and I'm not sure that its stability should be the goal that prevents improvements to the printing format.
Regarding churn, we should not forget that not improving the printed format also causes churn every time someone is reading or modifying it.
I think that redundancy in IR causes more issues than it solves, so I would prefer landing this change.

In D141804#4055148, @mravishankar wrote:

This is an unnecessary change that just causes downstream churn. Not sure it is worth it

Looking at the previous discussion, I think that people's opinions differ on this one. However, I would like to touch on the phrasing of this comment. The very fact that someone spent the time to make this change means that they do think it is necessary. Thus, in my opinion, stating that it is unnecessary in such absolutist terms is disrespectful. The same point could have been achieved by saying something like "I disagree with this change because <your argumentation>". Even if you know Alex personally, and you're sure he wouldn't mind the statement, this is still a public forum where others (maybe potential future contributors) will read these discussions. I think we should be more careful in how we communicate, and make sure that this forum is an inclusive place where others' opinions are respected.

Interesting, and in this case we cannot infer the result type based on operands? Isn’t the result type always matching the outs tensors?

I think Mehdi has a good point about spending more time on design. I agree clean ups are nice, however I feel like the incremental breaking changes are causing the problem and we shouldn’t take those lightly. I believe there has been several syntax changes proposed to linalg ops in the recent past? Should we make sure we have a syntax that won’t need more change in the near future before make those changes? (Maybe it is worth a quick chat on discourse to make sure we address all the potential syntax problems at once and not continuously break users). What do you think?

Sure, then I would like to remove result types from ALL linalg ops. I can prepare the patch, if you are fine with it.

I would still highly prefer those ops be removed.

Why? If you want these ops to be removed, should we also remove linalg.matmul, linalg.fill? If not, then what's the difference?

[not a statement of endorsement of either form]

I think that redundancy in IR causes more issues than it solves, so I would prefer landing this change

The redundancy is in the printed textual debugging format, not in the IR. For me the real question is what is more readable and easy to understand while debugging for the average engineer/user (and I've done this wrong a few times and resulted with something that is nice for me to read and write but doesn't serve this [and I'll cause breakages when I get around to fixing ...]). No redundancy is good but if reading requires solving a puzzle then some redundancy is better. This case I'm not sure TBH, knowing the outs linalg convention vs general MLIR convention/mathematical notation only (and redundancy can be removed by removing types from the outs section also possible if redundancy is main thing).

Mehdi mentioned not to block improvements based on the churn it produces (and that was a general point, not specific to this change I believe). I semi agree as this is unstable debugging textual format. But I've also seen a cosmetic change requiring more than a week of around the clock SWE time (so probably 3+ SWE weeks using regular measurements) for a cosmetic change that only some liked more. The delta improvement vs the cost wasn't really worth it IMHO, but improvement is not an objective measurement and others may value that improvement higher. Now having a script or tool that does the update changes such calculations (e.g., Alex showing here how to run an update here). Thomas also has good point about many small changes, we had like >20 small breaking changes in one segment that could have just been one and done, that way at least folks real work is not broken up multiple times.

So, it looks like launching a sed command is churn. Comments are not churn, even though they take much more time and effort.

I would like to have a single format for all DPS ops (linalg/thlo/linalg_ext). I would also prefer to remove result types like it was already done for map/reduce/broadcast/transpose + tHLO ops since DPS interface ties results and outs.

and redundancy can be removed by removing types from the outs section also possible if redundancy is main thing.

Then ins operand list would look differently from outs. I think it would look/read worse.

In strong favour of this.
All IR should be readable, especially linalg, which is not the easiest dialect.

In D141804#4058734, @pifon2a wrote:

Interesting, and in this case we cannot infer the result type based on operands? Isn’t the result type always matching the outs tensors?

I think Mehdi has a good point about spending more time on design. I agree clean ups are nice, however I feel like the incremental breaking changes are causing the problem and we shouldn’t take those lightly. I believe there has been several syntax changes proposed to linalg ops in the recent past? Should we make sure we have a syntax that won’t need more change in the near future before make those changes? (Maybe it is worth a quick chat on discourse to make sure we address all the potential syntax problems at once and not continuously break users). What do you think?

Sure, then I would like to remove result types from ALL linalg ops. I can prepare the patch, if you are fine with it.

It makes sense to me to have this be consistent. But again what I mostly want to make sure is that we don't do another breaking change in the syntax in 2 weeks.

In D141804#4059285, @pifon2a wrote:

So, it looks like launching a sed command is churn. Comments are not churn, even though they take much more time and effort.

In my experience, the main problem hasn't been only the integration but the fact that it disturbs developers flow. For instance developers often store models/IR in linalg format to be able to work in isolation of the front end. Those kind of changes forces them to regenerate models.

As Jacques mentioned, readable IR is subjective and compatibility breaking changes are expensive. So improvements are great but we should treat them like other important design changes by making sure we discuss and reach a rational design that we think can be stable. This way we can point developers to when they are suggesting more changes and change it only based on new information.

In D141804#4058011, @mehdi_amini wrote:

As a meta-point: I'm concerned about the push-back on improving things based on churn. The comment about users wanting stability can only be an argument about spending more time on design and being more thorough before adding anything to the codebase. However that means much less velocity and higher cost to experiment. So far this isn't the tradeoff we've been making in MLIR I believe.

I am not concerned about churn. Churn is part of working at HEAD with MLIR. But doesn't mean that all churn is good. There was a lot of churn when ops were split out of standard dialect, but that was done deliberately to get to a better end state. I am not convinced this is a better end state. It is a parser/printer change that is made with the argument of increasing readability. Firstly, I am not sure it does (now you need to know that the outs operand type correspond to result type, and it only works for named ops not linalg.generics). Second, if it is not a clear win, and in absence of more justification, I would vote for status quo so that we don't subject all downstream users to have to update their lit tests for seemingly no benefit.
Saying that there are other changes that did this is not justification enough IMO. There were changes that were also blocked (Jeff wanted to make a change that added a comma to some arith dialect ops, there was also a discussion on change outs to init for Linalg ops that I wasnt sure is worth it for the same reason even if I prefer init).

In D141804#4057397, @mravishankar wrote:

I dont see any reason why linalg ops need to be readable.

Can you elaborate? I'm surprised because taken at face value, this comment is quite against the concept of custom printer in MLIR themselves!

Ok, fair point. Narrowing my scope of the comment. I think this change does not increase the readability of Linalg ops. I understand personal preference, but I am more arguing for status quo when the distinction comes down to personal preferences.

Eliding return types when they are coupled is quite the norm I believe in upstream dialects, what makes it different in your view here?

If the ops already do this, that is the status quo. I am just saying I dont think there is enough justification to change the status quo.

In D141804#4058265, @gflegar wrote:

From my experience integrating MLIR into Google's internal repository, breakages from changes like these seem to be very frequent (e.g., the two already mentioned above), so I don't see why this change should be blocked while others get a pass. If we want to be more careful about API breakages, then we should first formalize this somehow and apply the same rules to everyone.
I never considered the printed version of MLIR stable, and I'm not sure that its stability should be the goal that prevents improvements to the printing format.
Regarding churn, we should not forget that not improving the printed format also causes churn every time someone is reading or modifying it.
I think that redundancy in IR causes more issues than it solves, so I would prefer landing this change.

In D141804#4055148, @mravishankar wrote:

This is an unnecessary change that just causes downstream churn. Not sure it is worth it

Looking at the previous discussion, I think that people's opinions differ on this one. However, I would like to touch on the phrasing of this comment. The very fact that someone spent the time to make this change means that they do think it is necessary. Thus, in my opinion, stating that it is unnecessary in such absolutist terms is disrespectful. The same point could have been achieved by saying something like "I disagree with this change because <your argumentation>". Even if you know Alex personally, and you're sure he wouldn't mind the statement, this is still a public forum where others (maybe potential future contributors) will read these discussions. I think we should be more careful in how we communicate, and make sure that this forum is an inclusive place where others' opinions are respected.

Acknowledged. Will try to phrase in less absolutist terms.

In D141804#4059750, @mravishankar wrote:

In D141804#4058011, @mehdi_amini wrote:

As a meta-point: I'm concerned about the push-back on improving things based on churn. The comment about users wanting stability can only be an argument about spending more time on design and being more thorough before adding anything to the codebase. However that means much less velocity and higher cost to experiment. So far this isn't the tradeoff we've been making in MLIR I believe.

I am not concerned about churn. Churn is part of working at HEAD with MLIR. But doesn't mean that all churn is good. There was a lot of churn when ops were split out of standard dialect, but that was done deliberately to get to a better end state. I am not convinced this is a better end state.

Sure, we agree here, but it seems you’re not answering to the part of my comment I expected: this was a meta point about churn-based arguments specifically. There is no question we need to focus on the merits of the change in itself! Let’s continue below.

It is a parser/printer change that is made with the argument of increasing readability. Firstly, I am not sure it does (now you need to know that the outs operand type correspond to result type, and it only works for named ops not linalg.generics).

Right: but I claimed earlier that this seems consistent with most other dialects, at least upstream.

Second, if it is not a clear win, and in absence of more justification, I would vote for status quo so that we don't subject all downstream users to have to update their lit tests for seemingly no benefit.

Absolutely: we shouldn’t change things for no benefit. Can we focus on the why you don’t think there is a benefit here?
To start with, I see at minima design consistency with the general dialect design I think I saw upstream so far.

Saying that there are other changes that did this is not justification enough IMO. There were changes that were also blocked (Jeff wanted to make a change that added a comma to some arith dialect ops, there was also a discussion on change outs to init for Linalg ops that I wasnt sure is worth it for the same reason even if I prefer init).

In D141804#4057397, @mravishankar wrote:

I dont see any reason why linalg ops need to be readable.

Can you elaborate? I'm surprised because taken at face value, this comment is quite against the concept of custom printer in MLIR themselves!

Ok, fair point. Narrowing my scope of the comment. I think this change does not increase the readability of Linalg ops. I understand personal preference, but I am more arguing for status quo when the distinction comes down to personal preferences.

Yeah I’d like to go beyond personal preference here and find some guidelines /design principles we can anchor to (now and in the future) when discussing custom syntax.
I think the best outcome here will be a new guideline documentation on the website :)

Eliding return types when they are coupled is quite the norm I believe in upstream dialects, what makes it different in your view here?

If the ops already do this, that is the status quo. I am just saying I dont think there is enough justification to change the status quo.

I don’t get your point here: this goes back to my meta point before.
If the op does not do it because it wasn’t well considered when introduced, the issue is in the original review instead of now.

To start with, I see at minima design consistency with the general dialect design I think I saw upstream so far.

Linalg ops syntax is

%foo = linalg.<some-op> ins(... : <list of input types>) outs(... : <list of output types>) -> <return types>

Ops AFAIK in general are

%foo = <some op> : <inputtypes> -> <result type>

for the most part.... So linalg ops are anyway not consistent. If we want to evolve Linalg ops syntax, then can we do it in one shot to a state that has general consensus. Also in terms of consistency, these are only for named Linalg ops and not linalg.generic, which by itself is not consistent.

For instance developers often store models/IR in linalg format to be able to work in isolation of the front end. Those kind of changes forces them to regenerate models.

I think this is the only valid argument against landing this PR in this whole thread.

I am also curious why linalg was used as a storage format. Are there many clients/models that use it instead of higher level dialects?

In D141804#4066465, @pifon2a wrote:

For instance developers often store models/IR in linalg format to be able to work in isolation of the front end. Those kind of changes forces them to regenerate models.

I think this is the only valid argument against landing this PR in this whole thread.

I am also curious why linalg was used as a storage format. Are there many clients/models that use it instead of higher level dialects?

My understanding is that it allows to keep a representation of the model independently on importers/front ends.

Adding few people who had problems in the past with breaking linalg changes so that they can weight in and give more details on why having a more stable linalg representation would help:
@Benoit @harsh @powderluv

We at nod.ai ship / serve terabytes of Linalg IR everyday (and it is increasing) as part of SHARK (https://github.com/nod-ai/SHARK). We have end users on metered connections (or offline) we try our best to reduce churn for. Stable Diffusion requires shipping 6GB of linalg IR for FP16 (per model variant) and we currently support 6 variants atleast. In early December we switched back to textual representation (at the cost of storage space) to provide better stability of the IR. We understand the cost of living on HEAD and having to regenerate our IR (we regenerate all the SHARK tank models every night but if an LLVM bump fails we stay on the older IR until fixed across the stack in LLVM, torch-mlir, IREE, SHARK etc ).

We have no opinion on this PR but just want to highlight downstream usage and the cost of a IR change that trickles down. Happy to share any other useful information if it helps inform our design.

manishucsd added a subscriber: manishucsd.Jan 19 2023, 7:17 PM

In D141804#4066465, @pifon2a wrote:

For instance developers often store models/IR in linalg format to be able to work in isolation of the front end. Those kind of changes forces them to regenerate models.

I think this is the only valid argument against landing this PR in this whole thread.

I would also hope the same consideration would be given to downstream users who have lit tests written using Linalg. I explicitly didnt use this example cause in reality shipping models in IR is not the recommended way (Linalg is not a stable IR), but writing lit tests for transformations downstream when your compiler is based on Linalg is legitimate. Hope the churn there is considered when making textual changes.

(btw, I was admonished earlier for using absolutist language in comments. Same should be applied here. All comments on this threads are from folks who have explicitly taken the time to review and participate in the discussion, at expense of other work/personal time. Please do respect the time taking by all members to review the change).

I am also curious why linalg was used as a storage format. Are there many clients/models that use it instead of higher level dialects?

I would like to second what Thomas, powderluv and Mahesh have been saying. Also, while the provided regex would indeed take care of the majority of cases, my concern is more about the discoverability here. Once you know that this change is the reason for the compiler errors you're getting, the regex means you're close to having fixed it already (although, powderluv's story shows that 'close' may still involve re-distributing multi-gigabyte files). But for an average user finding out that their usually successful commands have started to fail, how long will it take them to understand that that is the problem? The last time it happened, it took me hours in GDB, because I was getting cryptic errors and as a compiler developer, my first guess was that something was wrong in my own in-progress code. This could have been disproved easily by retrying without my local changes, but I wasn't expecting that to be broken already without my local changes. This is, I think, just the way most of us work. What would really help in instances like this is if we had better error messages. Particularly in a compatibility-break case like here, an error message explicitly calling out the breaking change, linking to this Review, such as error: as of D141804, linalg ops no longer have -> return type.

In D141804#4066843, @ThomasRaoux wrote:

In D141804#4066465, @pifon2a wrote:

For instance developers often store models/IR in linalg format to be able to work in isolation of the front end. Those kind of changes forces them to regenerate models.

I think this is the only valid argument against landing this PR in this whole thread.

I am also curious why linalg was used as a storage format. Are there many clients/models that use it instead of higher level dialects?

My understanding is that it allows to keep a representation of the model independently on importers/front ends.

Adding few people who had problems in the past with breaking linalg changes so that they can weight in and give more details on why having a more stable linalg representation would help:
@Benoit @harsh @powderluv

So far there was no requirement for stability on MLIR dialects, and in fact they change all the time, often in significant and non-trivial ways. The pretty-printed format is not even remotely close to being a stable storage format, and has never been advertised as such. I've been on the receiving end of this churn for a long time and I'm fine with dealing it for the sake of progress. If stability has become more important for you we should have a forum discussion and ideally turn it into some kind of written policy so we don't have to repeat the same discussion over and over again on random code reviews.

This discussion in particular has a strong smell of "churn is fine, as long as *I* do it", which we should avoid at all costs.

Than you @powderluv for describing your usecase. I was not aware that linalg is used as a storage/transit format.

When we discussed creating StableHLO as storage format for ML programs, we had a very similar discussion: Should we simply turn mhlo, our compiler IR, into the stable format and use that to store programs or was there value in separating the storage IR into a separate dialect. We went with the latter and one reason was to keep our flexibility to morph the compiler IR without impacting use cases like yours. StableHLO even provides some forward/backward compatibly, so that we can transform IR forward in an automated way (or enable scenarios like the error messages that @Benoit mentioned). This all comes at a cost but we decided it worthwhile to pay for the hlo family of dialects.

Which route should we take for linalg? We need to retain reasonable flexibility in changing the IR. Should we discuss formalizing a variant of linalg that is stable? Or would it be possible to migrate your usecase to a format that already has certain stability guarantees (StableHLO, TOSA, ...)? For the former, we should start a discussion on discourse. For the latter, the IREE team might be more appropriate to advise.

Also @nicolasvasilache for visibility.

Herald added a subscriber: thopre. · View Herald TranscriptJan 26 2023, 5:01 AM

@herhut the reason we chose Linalg like @ThomasRaoux mentioned is because it gives us an abstraction from the frontend and the backend. Currently in SHARK we lower from various frontends TF, JAX (HLO), PyTorch (torch-dialect), Tflite->TOSA into Linalg and then we target whatever backend the end-user wants to use. We could technically choose one of the higher level IRs like StableHLO and then write translators to other IRs but in the end they all come down to Linalg anyway for us (we only codegen). We want the ecosystem to grow and not be burdened so we are willing to take on any "churn for progress" whenever they happen. Some stability would be great as it evolves but I think that will happen as it matures. We, at least for our usecase don't need backward / forward compatibility yet. Happy to discuss more. Thanks for hearing our usecase.

In D141804#4066909, @powderluv wrote:

In early December we switched back to textual representation (at the cost of storage space) to provide better stability of the IR.

How is this supposed to be more stable? What were you using before?
The bytecode is intended to be more stable than textual IR (even though not resilient to Dialects changes at the moment, pending some versioning work)

In D141804#4060009, @mravishankar wrote:
To start with, I see at minima design consistency with the general dialect design I think I saw upstream so far.

Linalg ops syntax is
%foo = linalg.<some-op> ins(... : <list of input types>) outs(... : <list of output types>) -> <return types>
Ops AFAIK in general are
%foo = <some op> : <inputtypes> -> <result type>
for the most part.... So linalg ops are anyway not consistent. If we want to evolve Linalg ops syntax, then can we do it in one shot to a state that has general consensus. Also in terms of consistency, these are only for named Linalg ops and not linalg.generic, which by itself is not consistent.

Makes sense, let's work towards this! Shall we start a shared doc and go through IR samples, including various dialects, and try to put together a "style guide for dialect syntax"?

In D141804#4117123, @mehdi_amini wrote:

In D141804#4066909, @powderluv wrote:

In early December we switched back to textual representation (at the cost of storage space) to provide better stability of the IR.

How is this supposed to be more stable? What were you using before?
The bytecode is intended to be more stable than textual IR (even though not resilient to Dialects changes at the moment, pending some versioning work)

I think this may be a case of socialization / familiarity with that part of the system.
I haven't personally looked at how to use the bytecode, I see there is doc, should we have something in the Tutorial ?
Serializing string always seems tantalizingly easy until the castle build on sand starts to crumble..

In D141804#4060009, @mravishankar wrote:
To start with, I see at minima design consistency with the general dialect design I think I saw upstream so far.

Linalg ops syntax is
%foo = linalg.<some-op> ins(... : <list of input types>) outs(... : <list of output types>) -> <return types>
Ops AFAIK in general are
%foo = <some op> : <inputtypes> -> <result type>
for the most part.... So linalg ops are anyway not consistent. If we want to evolve Linalg ops syntax, then can we do it in one shot to a state that has general consensus. Also in terms of consistency, these are only for named Linalg ops and not linalg.generic, which by itself is not consistent.
Makes sense, let's work towards this! Shall we start a shared doc and go through IR samples, including various dialects, and try to put together a "style guide for dialect syntax"?

@mehdi_amini that would be fantastic!
This needs a serious owner to really lift these ops from their "assembly-level" status and I have not been able to do that part.
One thing I have always been frustrated about is the syntax delta between a linalg.generic and a simple math expression.
@stellaraccident and @gysit provided huge improvements to make it better at the python level but the IR remains very much underserved.

The ability to "see" computations as e.g. %C(i,j) = arith.addf(arith.mulf(%A(i, k), %B(k, j)), %C(i, j) brings a phase shift in cognitive capacity.
See "Fig. 6. TC Benchmarks used in the experiments." in https://dl.acm.org/doi/pdf/10.1145/3355606.
It was e.g. very useful to understand the anti-diagonal reduction pattern on the LHS operands in the kronecker3 example.

In D141804#4117244, @nicolasvasilache wrote:
In D141804#4117123, @mehdi_amini wrote:

In D141804#4066909, @powderluv wrote:

In early December we switched back to textual representation (at the cost of storage space) to provide better stability of the IR.

How is this supposed to be more stable? What were you using before?
The bytecode is intended to be more stable than textual IR (even though not resilient to Dialects changes at the moment, pending some versioning work)

I think this may be a case of socialization / familiarity with that part of the system.
I haven't personally looked at how to use the bytecode, I see there is doc, should we have something in the Tutorial ?
Serializing string always seems tantalizingly easy until the castle build on sand starts to crumble..
In D141804#4060009, @mravishankar wrote:
To start with, I see at minima design consistency with the general dialect design I think I saw upstream so far.

Linalg ops syntax is
%foo = linalg.<some-op> ins(... : <list of input types>) outs(... : <list of output types>) -> <return types>
Ops AFAIK in general are
%foo = <some op> : <inputtypes> -> <result type>
for the most part.... So linalg ops are anyway not consistent. If we want to evolve Linalg ops syntax, then can we do it in one shot to a state that has general consensus. Also in terms of consistency, these are only for named Linalg ops and not linalg.generic, which by itself is not consistent.
Makes sense, let's work towards this! Shall we start a shared doc and go through IR samples, including various dialects, and try to put together a "style guide for dialect syntax"?
@mehdi_amini that would be fantastic!
This needs a serious owner to really lift these ops from their "assembly-level" status and I have not been able to do that part.
One thing I have always been frustrated about is the syntax delta between a linalg.generic and a simple math expression.
@stellaraccident and @gysit provided huge improvements to make it better at the python level but the IR remains very much underserved.

The ability to "see" computations as e.g. %C(i,j) = arith.addf(arith.mulf(%A(i, k), %B(k, j)), %C(i, j) brings a phase shift in cognitive capacity.
See "Fig. 6. TC Benchmarks used in the experiments." in https://dl.acm.org/doi/pdf/10.1145/3355606.
It was e.g. very useful to understand the anti-diagonal reduction pattern on the LHS operands in the kronecker3 example.

@mehdi_amini and there is another aspect you can see in these examples that we have been kicking the can on: there is a strong need for a generic mechanism to represent sequences of these things (i.e. kronecker3 wraps 3 linalg.generic).
IREE has started creating new one-off named ops (softmax and flash_attention so far) and it really simplifies parts of the system.
Without a generic mechanism however, these require large amounts of 1-off C++.
The good thing though is that things compose hierarhically (i.e. softmax is messy but flash_attention is softmax(matmul(softmax))-ish).

Would be thrilled to discuss these aspects deeper too with you if you're interested.

In D141804#4117123, @mehdi_amini wrote:

In D141804#4066909, @powderluv wrote:

In early December we switched back to textual representation (at the cost of storage space) to provide better stability of the IR.

How is this supposed to be more stable? What were you using before?
The bytecode is intended to be more stable than textual IR (even though not resilient to Dialects changes at the moment, pending some versioning work)

We were also crossing project boundaries both updating to LLVM at a different pace. We needed torch-mlir to be updated to the latest LLVM release (and that was delayed by 2 weeks though it is on a weekly cadence which usually roughly lines up with IREE LLVM updates). We had to ship something that week so textual IR seemed to work. We have since switched back to bytecode IR. One other thing to note is that though the bytecode was supposed to be 4x smaller than the textual IR we currently only see a ~2x reduction (3.3GB Unet got to 1.6GB ). We haven't gotten to investigating why we couldn't get everything in bytecode but if you look at the IR (https://storage.googleapis.com/shark_tank/latest/unet64_512_512_fp16_stabilityai_stable_diffusion_2_1_base/unet64_512_512_fp16_stabilityai_stable_diffusion_2_1_base_torch.mlir) we still have a lot of text. Also another factor to consider having it in byte code reduced compile times by 2x on the system I tested on.

In D141804#4117244, @nicolasvasilache wrote:
In D141804#4117123, @mehdi_amini wrote:

In D141804#4066909, @powderluv wrote:

In early December we switched back to textual representation (at the cost of storage space) to provide better stability of the IR.

How is this supposed to be more stable? What were you using before?
The bytecode is intended to be more stable than textual IR (even though not resilient to Dialects changes at the moment, pending some versioning work)

I think this may be a case of socialization / familiarity with that part of the system.
I haven't personally looked at how to use the bytecode, I see there is doc, should we have something in the Tutorial ?
Serializing string always seems tantalizingly easy until the castle build on sand starts to crumble..
In D141804#4060009, @mravishankar wrote:
To start with, I see at minima design consistency with the general dialect design I think I saw upstream so far.

Linalg ops syntax is
%foo = linalg.<some-op> ins(... : <list of input types>) outs(... : <list of output types>) -> <return types>
Ops AFAIK in general are
%foo = <some op> : <inputtypes> -> <result type>
for the most part.... So linalg ops are anyway not consistent. If we want to evolve Linalg ops syntax, then can we do it in one shot to a state that has general consensus. Also in terms of consistency, these are only for named Linalg ops and not linalg.generic, which by itself is not consistent.
Makes sense, let's work towards this! Shall we start a shared doc and go through IR samples, including various dialects, and try to put together a "style guide for dialect syntax"?
@mehdi_amini that would be fantastic!
This needs a serious owner to really lift these ops from their "assembly-level" status and I have not been able to do that part.
One thing I have always been frustrated about is the syntax delta between a linalg.generic and a simple math expression.
@stellaraccident and @gysit provided huge improvements to make it better at the python level but the IR remains very much underserved.

The ability to "see" computations as e.g. %C(i,j) = arith.addf(arith.mulf(%A(i, k), %B(k, j)), %C(i, j) brings a phase shift in cognitive capacity.
See "Fig. 6. TC Benchmarks used in the experiments." in https://dl.acm.org/doi/pdf/10.1145/3355606.
It was e.g. very useful to understand the anti-diagonal reduction pattern on the LHS operands in the kronecker3 example.

Lots of discussion on this thread that I am not responding to in this comment -- I am just narrowly responding to thing thing I was plussed in to. Having done some of the ergonomic work, I think that some of this is a bridge too far to expect from a low level compiler IR and I've not personally seen the benefit emerge to go to an end state of a pure math representation in the abstract. I'm +1 on readability and compactness improvements, especially if they are with some design thought put in that gets them closer to a good end state and aren't just going to oscillate a bunch of times based on preference. Churn is the price we pay to move forward. I would like to actually move forward, though, not just oscillate on preferences or non goals, and that is how I evaluate things.

I think the "mathy" and C++ vs DSL preference for op definitions is orthogonal to this discussion and, in hindsight, may even be a non goal for this layer of representation. It feels like something that would be much better homed at a higher level, personally. The core IR does not need to solve all concerns in such a system, imo.

In D141804#4118108, @stellaraccident wrote:
In D141804#4117244, @nicolasvasilache wrote:
In D141804#4117123, @mehdi_amini wrote:

In D141804#4066909, @powderluv wrote:

In early December we switched back to textual representation (at the cost of storage space) to provide better stability of the IR.

How is this supposed to be more stable? What were you using before?
The bytecode is intended to be more stable than textual IR (even though not resilient to Dialects changes at the moment, pending some versioning work)

I think this may be a case of socialization / familiarity with that part of the system.
I haven't personally looked at how to use the bytecode, I see there is doc, should we have something in the Tutorial ?
Serializing string always seems tantalizingly easy until the castle build on sand starts to crumble..
In D141804#4060009, @mravishankar wrote:
To start with, I see at minima design consistency with the general dialect design I think I saw upstream so far.

Linalg ops syntax is
%foo = linalg.<some-op> ins(... : <list of input types>) outs(... : <list of output types>) -> <return types>
Ops AFAIK in general are
%foo = <some op> : <inputtypes> -> <result type>
for the most part.... So linalg ops are anyway not consistent. If we want to evolve Linalg ops syntax, then can we do it in one shot to a state that has general consensus. Also in terms of consistency, these are only for named Linalg ops and not linalg.generic, which by itself is not consistent.
Makes sense, let's work towards this! Shall we start a shared doc and go through IR samples, including various dialects, and try to put together a "style guide for dialect syntax"?
@mehdi_amini that would be fantastic!
This needs a serious owner to really lift these ops from their "assembly-level" status and I have not been able to do that part.
One thing I have always been frustrated about is the syntax delta between a linalg.generic and a simple math expression.
@stellaraccident and @gysit provided huge improvements to make it better at the python level but the IR remains very much underserved.

The ability to "see" computations as e.g. %C(i,j) = arith.addf(arith.mulf(%A(i, k), %B(k, j)), %C(i, j) brings a phase shift in cognitive capacity.
See "Fig. 6. TC Benchmarks used in the experiments." in https://dl.acm.org/doi/pdf/10.1145/3355606.
It was e.g. very useful to understand the anti-diagonal reduction pattern on the LHS operands in the kronecker3 example.
Lots of discussion on this thread that I am not responding to in this comment -- I am just narrowly responding to thing thing I was plussed in to. Having done some of the ergonomic work, I think that some of this is a bridge too far to expect from a low level compiler IR and I've not personally seen the benefit emerge to go to an end state of a pure math representation in the abstract. I'm +1 on readability and compactness improvements, especially if they are with some design thought put in that gets them closer to a good end state and aren't just going to oscillate a bunch of times based on preference. Churn is the price we pay to move forward. I would like to actually move forward, though, not just oscillate on preferences or non goals, and that is how I evaluate things.

I think the "mathy" and C++ vs DSL preference for op definitions is orthogonal to this discussion and, in hindsight, may even be a non goal for this layer of representation. It feels like something that would be much better homed at a higher level, personally. The core IR does not need to solve all concerns in such a system, imo.

Good point re. "The core IR does not need to solve all concerns in such a system, imo."
This does point towards higher-level layers indeed, thanks for sharing your insights!

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Linalg/

IR/

LinalgOps.cpp

12 lines

test/

Conversion/

TensorToLinalg/

tensor-ops-to-linalg.mlir

4 lines

TosaToLinalg/

tosa-to-linalg-named.mlir

32 lines

Dialect/

Bufferization/

Transforms/

one-shot-bufferize-analysis-empty-tensor-elimination.mlir

4 lines

one-shot-bufferize-empty-tensor-elimination.mlir

12 lines

one-shot-bufferize-partial.mlir

2 lines

one-shot-module-bufferize-allow-return-allocs.mlir

2 lines

one-shot-module-bufferize-analysis.mlir

49 lines

one-shot-module-bufferize-invalid.mlir

2 lines

one-shot-module-bufferize.mlir

12 lines

transform-ops.mlir

2 lines

GPU/

transform-gpu-failing.mlir

2 lines

LLVM/

transform-e2e.mlir

1 line

Linalg/

affine.mlir

4 lines

bubble-up-extract-slice-op.mlir

14 lines

bufferize.mlir

4 lines

canonicalize.mlir

46 lines

drop-unit-extent-dims.mlir

12 lines

erase-unused-operands-and-results.mlir

4 lines

fusion-elementwise-ops.mlir

10 lines

generalize-named-ops.mlir

4 lines

generalize-named-polymorphic-ops.mlir

74 lines

generalize-pad-tensor.mlir

4 lines

invalid.mlir

46 lines

named-ops.mlir

182 lines

namedop_conversion.mlir

4 lines

one-shot-bufferize-analysis-2fill-extract-matmul-all-perms.mlir

144 lines

one-shot-bufferize.mlir

13 lines

reshape_control_fusion.mlir

2 lines

resolve-shaped-type-result-dims.mlir

2 lines

roundtrip.mlir

9 lines

swap-extract-slice-with-fill.mlir

6 lines

tile-and-fuse-tensors.mlir

23 lines

tile-tensors.mlir

3 lines

tile-to-foreach-thread.mlir

14 lines

transform-op-decompose.mlir

20 lines

transform-op-fuse-into-containing.mlir

16 lines

transform-op-fuse.mlir

12 lines

transform-op-generalize.mlir

2 lines

transform-op-interchange.mlir

2 lines

transform-op-multitile-sizes.mlir

2 lines

transform-op-pad.mlir

8 lines

transform-op-scalarize.mlir

2 lines

transform-op-split-reduction-by-scaling.mlir

2 lines

transform-op-split-reduction.mlir

16 lines

transform-op-tile.mlir

6 lines

transform-op-vectorize.mlir

8 lines

transform-tile-and-fuse.mlir

8 lines

transform-tile-reduction.mlir

14 lines

vectorization.mlir

25 lines

SCF/

one-shot-bufferize-analysis.mlir

14 lines

one-shot-bufferize.mlir

8 lines

SparseTensor/

sparse_expand.mlir

4 lines

sparse_fill_zero.mlir

4 lines

sparse_kernels.mlir

10 lines

sparse_matmul_codegen.mlir

2 lines

Tensor/

one-shot-bufferize.mlir

10 lines

Transform/

selective-targeting.mlir

9 lines

Vector/

transform-vector.mlir

1 line

Integration/

Dialect/

Linalg/

CPU/

test-one-shot-bufferize.mlir

10 lines

test-tensor-matmul.mlir

2 lines

SparseTensor/

CPU/

sparse_conv_1d_nwc_wcf.mlir

8 lines

sparse_conv_2d.mlir

10 lines

sparse_conv_2d_nhwc_hwcf.mlir

8 lines

sparse_conv_3d.mlir

8 lines

sparse_conv_3d_ndhwc_dhwcf.mlir

8 lines

sparse_dot.mlir

2 lines

sparse_expand.mlir

2 lines

sparse_filter_conv2d.mlir

4 lines

sparse_matmul.mlir

6 lines

sparse_quantized_matmul.mlir

2 lines

Interfaces/

TilingInterface/

tile-and-fuse-using-interface.mlir

38 lines

tile-using-interface.mlir

6 lines

python/

dialects/

linalg/

ops.py

2 lines

Diff 489397

mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp

Show First 20 Lines • Show All 207 Lines • ▼ Show 20 Lines	static ParseResult parseNamedStructuredOp(OpAsmParser &parser,
OperationState &result,		OperationState &result,
unsigned numRegionArgs,		unsigned numRegionArgs,
RegionBuilderFn regionBuilder) {		RegionBuilderFn regionBuilder) {
// TODO: Enable when ods-gen supports captures.		// TODO: Enable when ods-gen supports captures.
SmallVector<Type, 1> inputTypes, outputTypes;		SmallVector<Type, 1> inputTypes, outputTypes;
if (parseCommonStructuredOpParts(parser, result, inputTypes, outputTypes))		if (parseCommonStructuredOpParts(parser, result, inputTypes, outputTypes))
return failure();		return failure();

// TODO: consider merging results parsing into region parsing.		if (outputTypes.empty())
// Need to wait for declarative assembly resolution to decide.
SmallVector<Type, 1> outputTensorsTypes;
if (parseNamedStructuredOpResults(parser, outputTensorsTypes))
return failure();		return failure();
result.addTypes(outputTensorsTypes);
		if (outputTypes.front().isa<RankedTensorType>())
		result.addTypes(outputTypes);

std::unique_ptr<Region> region = std::make_unique<Region>();		std::unique_ptr<Region> region = std::make_unique<Region>();
if (parseNamedStructuredOpRegion(parser, *region, numRegionArgs, inputTypes,		if (parseNamedStructuredOpRegion(parser, *region, numRegionArgs, inputTypes,
outputTypes, result.attributes.getAttrs(),		outputTypes, result.attributes.getAttrs(),
regionBuilder))		regionBuilder))
return failure();		return failure();
result.addRegion(std::move(region));		result.addRegion(std::move(region));

Show All 15 Lines	p.printOptionalAttrDict(
// See generated code in		// See generated code in
// LinalgNamedStructuredOps.yamlgen.cpp.inc		// LinalgNamedStructuredOps.yamlgen.cpp.inc
"linalg.memoized_indexing_maps"});		"linalg.memoized_indexing_maps"});

// Printing is shared with generic ops, except for the region and		// Printing is shared with generic ops, except for the region and
// attributes.		// attributes.
printCommonStructuredOpParts(p, inputs, outputs);		printCommonStructuredOpParts(p, inputs, outputs);

// Results printing.
printNamedStructuredOpResults(p, op->getResultTypes());

// Region is elided.		// Region is elided.
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Region builder helper.		// Region builder helper.
// TODO: Move this to a utility library.		// TODO: Move this to a utility library.
// The public methods on this class are referenced directly from generated code.		// The public methods on this class are referenced directly from generated code.
// Helper build the unary, binary, and type conversion functions defined by the		// Helper build the unary, binary, and type conversion functions defined by the
▲ Show 20 Lines • Show All 1,825 Lines • Show Last 20 Lines

mlir/test/Conversion/TensorToLinalg/tensor-ops-to-linalg.mlir

	// RUN: mlir-opt -split-input-file -convert-tensor-to-linalg -cse -verify-diagnostics %s \| FileCheck %s			// RUN: mlir-opt -split-input-file -convert-tensor-to-linalg -cse -verify-diagnostics %s \| FileCheck %s

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// tensor.pad			// tensor.pad
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// CHECK-LABEL: func @generalize_pad_tensor_static_shape(			// CHECK-LABEL: func @generalize_pad_tensor_static_shape(
	// CHECK-SAME: %[[IN:.*]]: tensor<1x28x28x1xf32>) -> tensor<1x32x32x1xf32> {			// CHECK-SAME: %[[IN:.*]]: tensor<1x28x28x1xf32>) -> tensor<1x32x32x1xf32> {
	// CHECK: %[[C0:.*]] = arith.constant 0.000000e+00 : f32			// CHECK: %[[C0:.*]] = arith.constant 0.000000e+00 : f32
	// CHECK: %[[INIT:.*]] = tensor.empty() : tensor<1x32x32x1xf32>			// CHECK: %[[INIT:.*]] = tensor.empty() : tensor<1x32x32x1xf32>
	// CHECK: %[[FILL:.*]] = linalg.fill ins(%[[C0]] : f32) outs(%[[INIT]] : tensor<1x32x32x1xf32>) -> tensor<1x32x32x1xf32>			// CHECK: %[[FILL:.*]] = linalg.fill ins(%[[C0]] : f32) outs(%[[INIT]] : tensor<1x32x32x1xf32>)
	// CHECK: %[[PADDED:.*]] = tensor.insert_slice %[[IN]] into %[[FILL]][0, 2, 2, 0] [1, 28, 28, 1] [1, 1, 1, 1] : tensor<1x28x28x1xf32> into tensor<1x32x32x1xf32>			// CHECK: %[[PADDED:.*]] = tensor.insert_slice %[[IN]] into %[[FILL]][0, 2, 2, 0] [1, 28, 28, 1] [1, 1, 1, 1] : tensor<1x28x28x1xf32> into tensor<1x32x32x1xf32>
	// CHECK: return %[[PADDED]] : tensor<1x32x32x1xf32>			// CHECK: return %[[PADDED]] : tensor<1x32x32x1xf32>
	func.func @generalize_pad_tensor_static_shape(%arg0: tensor<1x28x28x1xf32>) -> tensor<1x32x32x1xf32> {			func.func @generalize_pad_tensor_static_shape(%arg0: tensor<1x28x28x1xf32>) -> tensor<1x32x32x1xf32> {
	%cst = arith.constant 0.000000e+00 : f32			%cst = arith.constant 0.000000e+00 : f32
	%0 = tensor.pad %arg0 low[0, 2, 2, 0] high[0, 2, 2, 0] {			%0 = tensor.pad %arg0 low[0, 2, 2, 0] high[0, 2, 2, 0] {
	^bb0(%arg1: index, %arg2: index, %arg3: index, %arg4: index):			^bb0(%arg1: index, %arg2: index, %arg3: index, %arg4: index):
	tensor.yield %cst : f32			tensor.yield %cst : f32
	} : tensor<1x28x28x1xf32> to tensor<1x32x32x1xf32>			} : tensor<1x28x28x1xf32> to tensor<1x32x32x1xf32>
	return %0 : tensor<1x32x32x1xf32>			return %0 : tensor<1x32x32x1xf32>
	}			}

	// CHECK-LABEL: func @generalize_pad_tensor_dynamic_shape(			// CHECK-LABEL: func @generalize_pad_tensor_dynamic_shape(
	// CHECK-SAME: %[[IN:.*]]: tensor<4x?x2x?xf32>,			// CHECK-SAME: %[[IN:.*]]: tensor<4x?x2x?xf32>,
	// CHECK-SAME: %[[OFFSET:.*]]: index) -> tensor<4x?x?x?xf32> {			// CHECK-SAME: %[[OFFSET:.*]]: index) -> tensor<4x?x?x?xf32> {
	// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index			// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
	// CHECK-DAG: %[[CST:.*]] = arith.constant 0.000000e+00 : f32			// CHECK-DAG: %[[CST:.*]] = arith.constant 0.000000e+00 : f32
	// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index			// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
	// CHECK: %[[DIM1:.*]] = tensor.dim %[[IN]], %[[C1]] : tensor<4x?x2x?xf32>			// CHECK: %[[DIM1:.*]] = tensor.dim %[[IN]], %[[C1]] : tensor<4x?x2x?xf32>
	// CHECK-DAG: %[[C2:.*]] = arith.constant 2 : index			// CHECK-DAG: %[[C2:.*]] = arith.constant 2 : index
	// CHECK: %[[OUT_DIM2:.*]] = arith.addi %[[C2]], %[[OFFSET]] : index			// CHECK: %[[OUT_DIM2:.*]] = arith.addi %[[C2]], %[[OFFSET]] : index
	// CHECK-DAG: %[[C3:.*]] = arith.constant 3 : index			// CHECK-DAG: %[[C3:.*]] = arith.constant 3 : index
	// CHECK: %[[DIM3:.*]] = tensor.dim %[[IN]], %[[C3]] : tensor<4x?x2x?xf32>			// CHECK: %[[DIM3:.*]] = tensor.dim %[[IN]], %[[C3]] : tensor<4x?x2x?xf32>
	// CHECK: %[[OUT_DIM3:.*]] = arith.addi %[[DIM3]], %[[OFFSET]] : index			// CHECK: %[[OUT_DIM3:.*]] = arith.addi %[[DIM3]], %[[OFFSET]] : index
	// CHECK: %[[INIT:.*]] = tensor.empty(%[[DIM1]], %[[OUT_DIM2]], %[[OUT_DIM3]]) : tensor<4x?x?x?xf32>			// CHECK: %[[INIT:.*]] = tensor.empty(%[[DIM1]], %[[OUT_DIM2]], %[[OUT_DIM3]]) : tensor<4x?x?x?xf32>
	// CHECK: %[[FILL:.*]] = linalg.fill ins(%[[CST]] : f32) outs(%[[INIT]] : tensor<4x?x?x?xf32>) -> tensor<4x?x?x?xf32>			// CHECK: %[[FILL:.*]] = linalg.fill ins(%[[CST]] : f32) outs(%[[INIT]] : tensor<4x?x?x?xf32>)
	// CHECK: %[[PADDED:.*]] = tensor.insert_slice %[[IN]] into %[[FILL]]{{\[}}%[[C0]], %[[C0]], %[[OFFSET]], %[[C0]]] [4, %[[DIM1]], 2, %[[DIM3]]] [1, 1, 1, 1] : tensor<4x?x2x?xf32> into tensor<4x?x?x?xf32>			// CHECK: %[[PADDED:.*]] = tensor.insert_slice %[[IN]] into %[[FILL]]{{\[}}%[[C0]], %[[C0]], %[[OFFSET]], %[[C0]]] [4, %[[DIM1]], 2, %[[DIM3]]] [1, 1, 1, 1] : tensor<4x?x2x?xf32> into tensor<4x?x?x?xf32>
	// CHECK: return %[[PADDED]] : tensor<4x?x?x?xf32>			// CHECK: return %[[PADDED]] : tensor<4x?x?x?xf32>
	// CHECK: }			// CHECK: }
	func.func @generalize_pad_tensor_dynamic_shape(%arg0: tensor<4x?x2x?xf32>, %arg1: index) -> tensor<4x?x?x?xf32> {			func.func @generalize_pad_tensor_dynamic_shape(%arg0: tensor<4x?x2x?xf32>, %arg1: index) -> tensor<4x?x?x?xf32> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cst = arith.constant 0.0 : f32			%cst = arith.constant 0.0 : f32
	%out = tensor.pad %arg0 low[%c0, %c0, %arg1, %c0] high[%c0, %c0, %c0, %arg1] {			%out = tensor.pad %arg0 low[%c0, %c0, %arg1, %c0] high[%c0, %c0, %c0, %arg1] {
	^bb0(%gen_arg1: index, %gen_arg2: index, %gen_arg3: index, %gen_arg4: index):			^bb0(%gen_arg1: index, %gen_arg2: index, %gen_arg3: index, %gen_arg4: index):
	tensor.yield %cst : f32			tensor.yield %cst : f32
	} : tensor<4x?x2x?xf32> to tensor<4x?x?x?xf32>			} : tensor<4x?x2x?xf32> to tensor<4x?x?x?xf32>
	return %out : tensor<4x?x?x?xf32>			return %out : tensor<4x?x?x?xf32>
	}			}

mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-named.mlir

// RUN: mlir-opt --split-input-file -pass-pipeline="builtin.module(func.func(tosa-to-linalg-named))" %s -verify-diagnostics -o -\| FileCheck %s		// RUN: mlir-opt --split-input-file -pass-pipeline="builtin.module(func.func(tosa-to-linalg-named))" %s -verify-diagnostics -o -\| FileCheck %s

// CHECK-LABEL: @matmul		// CHECK-LABEL: @matmul
func.func @matmul(%arg0: tensor<1x5x3xf32>, %arg1: tensor<1x3x6xf32>) -> (tensor<1x5x6xf32>) {		func.func @matmul(%arg0: tensor<1x5x3xf32>, %arg1: tensor<1x3x6xf32>) -> (tensor<1x5x6xf32>) {
// CHECK: [[C0:%.+]] = arith.constant 0		// CHECK: [[C0:%.+]] = arith.constant 0
// CHECK: [[INIT:%.+]] = tensor.empty()		// CHECK: [[INIT:%.+]] = tensor.empty()
// CHECK: [[FILLED:%.+]] = linalg.fill ins([[C0]] : f32) outs([[INIT]] : tensor<1x5x6xf32>) -> tensor<1x5x6xf32>		// CHECK: [[FILLED:%.+]] = linalg.fill ins([[C0]] : f32) outs([[INIT]] : tensor<1x5x6xf32>)
// CHECK: linalg.batch_matmul ins(%arg0, %arg1 : tensor<1x5x3xf32>, tensor<1x3x6xf32>) outs([[FILLED]] : tensor<1x5x6xf32>) -> tensor<1x5x6xf32>		// CHECK: linalg.batch_matmul ins(%arg0, %arg1 : tensor<1x5x3xf32>, tensor<1x3x6xf32>) outs([[FILLED]] : tensor<1x5x6xf32>)
%0 = "tosa.matmul"(%arg0, %arg1) : (tensor<1x5x3xf32>, tensor<1x3x6xf32>) -> (tensor<1x5x6xf32>)		%0 = "tosa.matmul"(%arg0, %arg1) : (tensor<1x5x3xf32>, tensor<1x3x6xf32>) -> (tensor<1x5x6xf32>)
return %0 : tensor<1x5x6xf32>		return %0 : tensor<1x5x6xf32>
}		}

// -----		// -----


// CHECK-LABEL: @matmul_quantized		// CHECK-LABEL: @matmul_quantized
func.func @matmul_quantized(%arg0: tensor<1x5x3xi8>, %arg1: tensor<1x3x6xi8>) -> (tensor<1x5x6xi32>) {		func.func @matmul_quantized(%arg0: tensor<1x5x3xi8>, %arg1: tensor<1x3x6xi8>) -> (tensor<1x5x6xi32>) {
// CHECK: [[C0:%.+]] = arith.constant 0		// CHECK: [[C0:%.+]] = arith.constant 0
// CHECK: [[INIT:%.+]] = tensor.empty()		// CHECK: [[INIT:%.+]] = tensor.empty()
// CHECK: [[FILLED:%.+]] = linalg.fill ins([[C0]] : i32) outs([[INIT]] : tensor<1x5x6xi32>) -> tensor<1x5x6xi32>		// CHECK: [[FILLED:%.+]] = linalg.fill ins([[C0]] : i32) outs([[INIT]] : tensor<1x5x6xi32>)
// CHECK: [[ONE:%.+]] = arith.constant 1		// CHECK: [[ONE:%.+]] = arith.constant 1
// CHECK: [[TWO:%.+]] = arith.constant 2		// CHECK: [[TWO:%.+]] = arith.constant 2
// CHECK: linalg.quantized_batch_matmul ins(%arg0, %arg1, [[ONE]], [[TWO]] : tensor<1x5x3xi8>, tensor<1x3x6xi8>, i32, i32) outs([[FILLED]] : tensor<1x5x6xi32>) -> tensor<1x5x6xi32>		// CHECK: linalg.quantized_batch_matmul ins(%arg0, %arg1, [[ONE]], [[TWO]] : tensor<1x5x3xi8>, tensor<1x3x6xi8>, i32, i32) outs([[FILLED]] : tensor<1x5x6xi32>)
%0 = "tosa.matmul"(%arg0, %arg1) {quantization_info = #tosa.matmul_quant<a_zp = 1, b_zp = 2>} : (tensor<1x5x3xi8>, tensor<1x3x6xi8>) -> (tensor<1x5x6xi32>)		%0 = "tosa.matmul"(%arg0, %arg1) {quantization_info = #tosa.matmul_quant<a_zp = 1, b_zp = 2>} : (tensor<1x5x3xi8>, tensor<1x3x6xi8>) -> (tensor<1x5x6xi32>)
return %0 : tensor<1x5x6xi32>		return %0 : tensor<1x5x6xi32>
}		}

// -----		// -----

// CHECK-LABEL: @matmul_dyn_batch		// CHECK-LABEL: @matmul_dyn_batch
func.func @matmul_dyn_batch(%arg0: tensor<?x5x3xf32>, %arg1: tensor<?x3x6xf32>) -> (tensor<?x5x6xf32>) {		func.func @matmul_dyn_batch(%arg0: tensor<?x5x3xf32>, %arg1: tensor<?x3x6xf32>) -> (tensor<?x5x6xf32>) {
// CHECK: %[[C0:.+]] = arith.constant 0		// CHECK: %[[C0:.+]] = arith.constant 0
// CHECK: %[[DIM:.+]] = tensor.dim %arg0, %[[C0]]		// CHECK: %[[DIM:.+]] = tensor.dim %arg0, %[[C0]]
// CHECK: %[[C0_0:.+]] = arith.constant 0		// CHECK: %[[C0_0:.+]] = arith.constant 0
// CHECK: %[[INIT:.+]] = tensor.empty(%[[DIM]])		// CHECK: %[[INIT:.+]] = tensor.empty(%[[DIM]])
// CHECK: %[[FILLED:.+]] = linalg.fill ins(%[[C0_0]] : f32) outs(%[[INIT]] : tensor<?x5x6xf32>) -> tensor<?x5x6xf32>		// CHECK: %[[FILLED:.+]] = linalg.fill ins(%[[C0_0]] : f32) outs(%[[INIT]] : tensor<?x5x6xf32>)
// CHECK: linalg.batch_matmul ins(%arg0, %arg1 : tensor<?x5x3xf32>, tensor<?x3x6xf32>) outs(%[[FILLED]] : tensor<?x5x6xf32>) -> tensor<?x5x6xf32>		// CHECK: linalg.batch_matmul ins(%arg0, %arg1 : tensor<?x5x3xf32>, tensor<?x3x6xf32>) outs(%[[FILLED]] : tensor<?x5x6xf32>)
%0 = "tosa.matmul"(%arg0, %arg1) : (tensor<?x5x3xf32>, tensor<?x3x6xf32>) -> (tensor<?x5x6xf32>)		%0 = "tosa.matmul"(%arg0, %arg1) : (tensor<?x5x3xf32>, tensor<?x3x6xf32>) -> (tensor<?x5x6xf32>)
return %0 : tensor<?x5x6xf32>		return %0 : tensor<?x5x6xf32>
}		}

// -----		// -----

// CHECK-LABEL: @matmul_dyn_independent_dim		// CHECK-LABEL: @matmul_dyn_independent_dim
func.func @matmul_dyn_independent_dim(%arg0: tensor<1x5x3xf32>, %arg1: tensor<1x3x?xf32>) -> (tensor<1x5x?xf32>) {		func.func @matmul_dyn_independent_dim(%arg0: tensor<1x5x3xf32>, %arg1: tensor<1x3x?xf32>) -> (tensor<1x5x?xf32>) {
// CHECK: %[[C2:.+]] = arith.constant 2		// CHECK: %[[C2:.+]] = arith.constant 2
// CHECK: %[[DIM:.+]] = tensor.dim %arg1, %[[C2]]		// CHECK: %[[DIM:.+]] = tensor.dim %arg1, %[[C2]]
// CHECK: %[[C0:.+]] = arith.constant 0		// CHECK: %[[C0:.+]] = arith.constant 0
// CHECK: %[[INIT:.+]] = tensor.empty(%[[DIM]])		// CHECK: %[[INIT:.+]] = tensor.empty(%[[DIM]])
// CHECK: %[[FILLED:.+]] = linalg.fill ins(%[[C0]] : f32) outs(%[[INIT]] : tensor<1x5x?xf32>) -> tensor<1x5x?xf32>		// CHECK: %[[FILLED:.+]] = linalg.fill ins(%[[C0]] : f32) outs(%[[INIT]] : tensor<1x5x?xf32>)
// CHECK: linalg.batch_matmul ins(%arg0, %arg1 : tensor<1x5x3xf32>, tensor<1x3x?xf32>) outs(%[[FILLED]] : tensor<1x5x?xf32>) -> tensor<1x5x?xf32>		// CHECK: linalg.batch_matmul ins(%arg0, %arg1 : tensor<1x5x3xf32>, tensor<1x3x?xf32>) outs(%[[FILLED]] : tensor<1x5x?xf32>)
%0 = "tosa.matmul"(%arg0, %arg1) : (tensor<1x5x3xf32>, tensor<1x3x?xf32>) -> (tensor<1x5x?xf32>)		%0 = "tosa.matmul"(%arg0, %arg1) : (tensor<1x5x3xf32>, tensor<1x3x?xf32>) -> (tensor<1x5x?xf32>)
return %0 : tensor<1x5x?xf32>		return %0 : tensor<1x5x?xf32>
}		}

// -----		// -----

// CHECK-LABEL: @matmul_dyn_independent_dim		// CHECK-LABEL: @matmul_dyn_independent_dim
func.func @matmul_dyn_independent_dim(%arg0: tensor<1x5x?xf32>, %arg1: tensor<1x?x6xf32>) -> (tensor<1x5x6xf32>) {		func.func @matmul_dyn_independent_dim(%arg0: tensor<1x5x?xf32>, %arg1: tensor<1x?x6xf32>) -> (tensor<1x5x6xf32>) {
// CHECK: %[[C0:.+]] = arith.constant 0		// CHECK: %[[C0:.+]] = arith.constant 0
// CHECK: %[[INIT:.+]] = tensor.empty()		// CHECK: %[[INIT:.+]] = tensor.empty()
// CHECK: %[[FILLED:.+]] = linalg.fill ins(%[[C0]] : f32) outs(%[[INIT]] : tensor<1x5x6xf32>) -> tensor<1x5x6xf32>		// CHECK: %[[FILLED:.+]] = linalg.fill ins(%[[C0]] : f32) outs(%[[INIT]] : tensor<1x5x6xf32>)
// CHECK: linalg.batch_matmul ins(%arg0, %arg1 : tensor<1x5x?xf32>, tensor<1x?x6xf32>) outs(%[[FILLED]] : tensor<1x5x6xf32>) -> tensor<1x5x6xf32>		// CHECK: linalg.batch_matmul ins(%arg0, %arg1 : tensor<1x5x?xf32>, tensor<1x?x6xf32>) outs(%[[FILLED]] : tensor<1x5x6xf32>)
%0 = "tosa.matmul"(%arg0, %arg1) : (tensor<1x5x?xf32>, tensor<1x?x6xf32>) -> (tensor<1x5x6xf32>)		%0 = "tosa.matmul"(%arg0, %arg1) : (tensor<1x5x?xf32>, tensor<1x?x6xf32>) -> (tensor<1x5x6xf32>)
return %0 : tensor<1x5x6xf32>		return %0 : tensor<1x5x6xf32>
}		}

// -----		// -----

// CHECK: #[[$MAP1:.*]] = affine_map<(d0, d1) -> (d1)>		// CHECK: #[[$MAP1:.*]] = affine_map<(d0, d1) -> (d1)>
// CHECK: #[[$MAP2:.*]] = affine_map<(d0, d1) -> (d0, d1)>		// CHECK: #[[$MAP2:.*]] = affine_map<(d0, d1) -> (d0, d1)>

// CHECK-LABEL: @fully_connected		// CHECK-LABEL: @fully_connected
func.func @fully_connected(%arg0: tensor<5x3xf32>, %arg1: tensor<6x3xf32>, %arg2: tensor<6xf32>) -> (tensor<5x6xf32>) {		func.func @fully_connected(%arg0: tensor<5x3xf32>, %arg1: tensor<6x3xf32>, %arg2: tensor<6xf32>) -> (tensor<5x6xf32>) {
// CHECK: [[INITT:%.+]] = tensor.empty()		// CHECK: [[INITT:%.+]] = tensor.empty()
// CHECK: [[ZERO:%.+]] = arith.constant 0		// CHECK: [[ZERO:%.+]] = arith.constant 0
// CHECK: [[FILL:%.+]] = linalg.fill ins([[ZERO]]{{.*}}outs([[INITT]]		// CHECK: [[FILL:%.+]] = linalg.fill ins([[ZERO]]{{.*}}outs([[INITT]]
// CHECK: [[PERM:%.+]] = arith.constant dense<[1, 0]>		// CHECK: [[PERM:%.+]] = arith.constant dense<[1, 0]>
// CHECK: [[TRANSPOSE:%.+]] = "tosa.transpose"(%arg1, [[PERM]])		// CHECK: [[TRANSPOSE:%.+]] = "tosa.transpose"(%arg1, [[PERM]])
// CHECK: [[INITB:%.+]] = tensor.empty()		// CHECK: [[INITB:%.+]] = tensor.empty()
// CHECK: [[MATMUL:%.+]] = linalg.matmul ins(%arg0, [[TRANSPOSE]] : tensor<5x3xf32>, tensor<3x6xf32>) outs([[FILL]] : tensor<5x6xf32>) -> tensor<5x6xf32>		// CHECK: [[MATMUL:%.+]] = linalg.matmul ins(%arg0, [[TRANSPOSE]] : tensor<5x3xf32>, tensor<3x6xf32>) outs([[FILL]] : tensor<5x6xf32>)
// CHECK: [[ADDED:%.+]] = linalg.generic {indexing_maps = [#[[$MAP1]], #[[$MAP2]], #[[$MAP2]]], iterator_types = ["parallel", "parallel"]} ins(%arg2, [[MATMUL]] : tensor<6xf32>, tensor<5x6xf32>) outs([[INITB]] : tensor<5x6xf32>) {		// CHECK: [[ADDED:%.+]] = linalg.generic {indexing_maps = [#[[$MAP1]], #[[$MAP2]], #[[$MAP2]]], iterator_types = ["parallel", "parallel"]} ins(%arg2, [[MATMUL]] : tensor<6xf32>, tensor<5x6xf32>) outs([[INITB]] : tensor<5x6xf32>) {
// CHECK: ^bb0(%[[ARG3:[0-9a-zA-Z_]+]]: f32, %[[ARG4:[0-9a-zA-Z_]+]]: f32, %[[ARG5:[0-9a-zA-Z_]+]]: f32):		// CHECK: ^bb0(%[[ARG3:[0-9a-zA-Z_]+]]: f32, %[[ARG4:[0-9a-zA-Z_]+]]: f32, %[[ARG5:[0-9a-zA-Z_]+]]: f32):
// CHECK: [[ADD:%.+]] = arith.addf %[[ARG3]], %[[ARG4]] : f32		// CHECK: [[ADD:%.+]] = arith.addf %[[ARG3]], %[[ARG4]] : f32
// CHECK: linalg.yield [[ADD]] : f32		// CHECK: linalg.yield [[ADD]] : f32

%0 = "tosa.fully_connected"(%arg0, %arg1, %arg2) : (tensor<5x3xf32>, tensor<6x3xf32>, tensor<6xf32>) -> (tensor<5x6xf32>)		%0 = "tosa.fully_connected"(%arg0, %arg1, %arg2) : (tensor<5x3xf32>, tensor<6x3xf32>, tensor<6xf32>) -> (tensor<5x6xf32>)
return %0 : tensor<5x6xf32>		return %0 : tensor<5x6xf32>
}		}

// -----		// -----

// CHECK: #[[$MAP1:.*]] = affine_map<(d0, d1) -> (d1)>		// CHECK: #[[$MAP1:.*]] = affine_map<(d0, d1) -> (d1)>
// CHECK: #[[$MAP2:.*]] = affine_map<(d0, d1) -> (d0, d1)>		// CHECK: #[[$MAP2:.*]] = affine_map<(d0, d1) -> (d0, d1)>

// CHECK-LABEL: @quantized_fully_connected		// CHECK-LABEL: @quantized_fully_connected
func.func @quantized_fully_connected(%arg0: tensor<5x3xi8>, %arg1: tensor<6x3xi8>, %arg2: tensor<6xi32>) -> (tensor<5x6xi32>) {		func.func @quantized_fully_connected(%arg0: tensor<5x3xi8>, %arg1: tensor<6x3xi8>, %arg2: tensor<6xi32>) -> (tensor<5x6xi32>) {
// CHECK: [[INITT:%.+]] = tensor.empty()		// CHECK: [[INITT:%.+]] = tensor.empty()
// CHECK: [[ZERO:%.+]] = arith.constant 0		// CHECK: [[ZERO:%.+]] = arith.constant 0
// CHECK: [[FILL:%.+]] = linalg.fill ins([[ZERO]]{{.*}}outs([[INITT]]		// CHECK: [[FILL:%.+]] = linalg.fill ins([[ZERO]]{{.*}}outs([[INITT]]
// CHECK: [[PERM:%.+]] = arith.constant dense<[1, 0]>		// CHECK: [[PERM:%.+]] = arith.constant dense<[1, 0]>
// CHECK: [[TRANSPOSE:%.+]] = "tosa.transpose"(%arg1, [[PERM]])		// CHECK: [[TRANSPOSE:%.+]] = "tosa.transpose"(%arg1, [[PERM]])
// CHECK: [[INITB:%.+]] = tensor.empty()		// CHECK: [[INITB:%.+]] = tensor.empty()
// CHECK: [[ONE:%.+]] = arith.constant 1		// CHECK: [[ONE:%.+]] = arith.constant 1
// CHECK: [[TWO:%.+]] = arith.constant 2		// CHECK: [[TWO:%.+]] = arith.constant 2
// CHECK: [[MATMUL:%.+]] = linalg.quantized_matmul ins(%arg0, [[TRANSPOSE]], [[ONE]], [[TWO]] : tensor<5x3xi8>, tensor<3x6xi8>, i32, i32) outs([[FILL]] : tensor<5x6xi32>) -> tensor<5x6xi32>		// CHECK: [[MATMUL:%.+]] = linalg.quantized_matmul ins(%arg0, [[TRANSPOSE]], [[ONE]], [[TWO]] : tensor<5x3xi8>, tensor<3x6xi8>, i32, i32) outs([[FILL]] : tensor<5x6xi32>)
// CHECK: [[ADDED:%.+]] = linalg.generic {indexing_maps = [#[[$MAP1]], #[[$MAP2]], #[[$MAP2]]], iterator_types = ["parallel", "parallel"]} ins(%arg2, [[MATMUL]] : tensor<6xi32>, tensor<5x6xi32>) outs([[INITB]]		// CHECK: [[ADDED:%.+]] = linalg.generic {indexing_maps = [#[[$MAP1]], #[[$MAP2]], #[[$MAP2]]], iterator_types = ["parallel", "parallel"]} ins(%arg2, [[MATMUL]] : tensor<6xi32>, tensor<5x6xi32>) outs([[INITB]]
// CHECK: ^bb0([[IN1:%.+]]: i32, [[IN2:%.+]]: i32, [[UNUSED:%.+]]: i32):		// CHECK: ^bb0([[IN1:%.+]]: i32, [[IN2:%.+]]: i32, [[UNUSED:%.+]]: i32):
// CHECK: [[ADD:%.+]] = arith.addi		// CHECK: [[ADD:%.+]] = arith.addi
// CHECK: linalg.yield [[ADD]] : i32		// CHECK: linalg.yield [[ADD]] : i32
%0 = "tosa.fully_connected"(%arg0, %arg1, %arg2) {quantization_info = #tosa.conv_quant<input_zp = 1, weight_zp = 2>} : (tensor<5x3xi8>, tensor<6x3xi8>, tensor<6xi32>) -> (tensor<5x6xi32>)		%0 = "tosa.fully_connected"(%arg0, %arg1, %arg2) {quantization_info = #tosa.conv_quant<input_zp = 1, weight_zp = 2>} : (tensor<5x3xi8>, tensor<6x3xi8>, tensor<6xi32>) -> (tensor<5x6xi32>)
return %0 : tensor<5x6xi32>		return %0 : tensor<5x6xi32>
}		}

// -----		// -----

// CHECK: #[[$MAP1:.*]] = affine_map<(d0, d1) -> (d1)>		// CHECK: #[[$MAP1:.*]] = affine_map<(d0, d1) -> (d1)>
// CHECK: #[[$MAP2:.*]] = affine_map<(d0, d1) -> (d0, d1)>		// CHECK: #[[$MAP2:.*]] = affine_map<(d0, d1) -> (d0, d1)>

// CHECK-LABEL: @fully_connected_dyn		// CHECK-LABEL: @fully_connected_dyn
func.func @fully_connected_dyn(%arg0: tensor<?x3xf32>, %arg1: tensor<6x3xf32>, %arg2: tensor<6xf32>) -> (tensor<?x6xf32>) {		func.func @fully_connected_dyn(%arg0: tensor<?x3xf32>, %arg1: tensor<6x3xf32>, %arg2: tensor<6xf32>) -> (tensor<?x6xf32>) {
// CHECK: %[[C0:.+]] = arith.constant 0		// CHECK: %[[C0:.+]] = arith.constant 0
// CHECK: %[[DIM:.+]] = tensor.dim %arg0, %[[C0]]		// CHECK: %[[DIM:.+]] = tensor.dim %arg0, %[[C0]]
// CHECK: %[[INITT:.+]] = tensor.empty(%[[DIM]])		// CHECK: %[[INITT:.+]] = tensor.empty(%[[DIM]])
// CHECK: %[[ZERO:.+]] = arith.constant 0		// CHECK: %[[ZERO:.+]] = arith.constant 0
// CHECK: %[[FILL:.+]] = linalg.fill ins(%[[ZERO]]{{.*}}outs(%[[INITT]]		// CHECK: %[[FILL:.+]] = linalg.fill ins(%[[ZERO]]{{.*}}outs(%[[INITT]]
// CHECK: %[[PERM:.+]] = arith.constant dense<[1, 0]>		// CHECK: %[[PERM:.+]] = arith.constant dense<[1, 0]>
// CHECK: %[[TRANSPOSE:.+]] = "tosa.transpose"(%arg1, %[[PERM]])		// CHECK: %[[TRANSPOSE:.+]] = "tosa.transpose"(%arg1, %[[PERM]])
// CHECK: %[[INITB:.+]] = tensor.empty(%[[DIM]])		// CHECK: %[[INITB:.+]] = tensor.empty(%[[DIM]])
// CHECK: %[[MATMUL:.+]] = linalg.matmul ins(%arg0, %[[TRANSPOSE]] : tensor<?x3xf32>, tensor<3x6xf32>) outs(%[[FILL]] : tensor<?x6xf32>) -> tensor<?x6xf32>		// CHECK: %[[MATMUL:.+]] = linalg.matmul ins(%arg0, %[[TRANSPOSE]] : tensor<?x3xf32>, tensor<3x6xf32>) outs(%[[FILL]] : tensor<?x6xf32>)
// CHECK: %[[ADDED:.+]] = linalg.generic {indexing_maps = [#[[$MAP1]], #[[$MAP2]], #[[$MAP2]]], iterator_types = ["parallel", "parallel"]} ins(%arg2, %[[MATMUL]] : tensor<6xf32>, tensor<?x6xf32>) outs(%[[INITB]] : tensor<?x6xf32>) {		// CHECK: %[[ADDED:.+]] = linalg.generic {indexing_maps = [#[[$MAP1]], #[[$MAP2]], #[[$MAP2]]], iterator_types = ["parallel", "parallel"]} ins(%arg2, %[[MATMUL]] : tensor<6xf32>, tensor<?x6xf32>) outs(%[[INITB]] : tensor<?x6xf32>) {
// CHECK: ^bb0(%[[ARG3:[0-9a-zA-Z_]+]]: f32, %[[ARG4:[0-9a-zA-Z_]+]]: f32, %[[ARG5:[0-9a-zA-Z_]+]]: f32):		// CHECK: ^bb0(%[[ARG3:[0-9a-zA-Z_]+]]: f32, %[[ARG4:[0-9a-zA-Z_]+]]: f32, %[[ARG5:[0-9a-zA-Z_]+]]: f32):
// CHECK: %[[ADD:.+]] = arith.addf %[[ARG3]], %[[ARG4]] : f32		// CHECK: %[[ADD:.+]] = arith.addf %[[ARG3]], %[[ARG4]] : f32
// CHECK: linalg.yield %[[ADD]] : f32		// CHECK: linalg.yield %[[ADD]] : f32

%0 = "tosa.fully_connected"(%arg0, %arg1, %arg2) : (tensor<?x3xf32>, tensor<6x3xf32>, tensor<6xf32>) -> (tensor<?x6xf32>)		%0 = "tosa.fully_connected"(%arg0, %arg1, %arg2) : (tensor<?x3xf32>, tensor<6x3xf32>, tensor<6xf32>) -> (tensor<?x6xf32>)
return %0 : tensor<?x6xf32>		return %0 : tensor<?x6xf32>
}		}
▲ Show 20 Lines • Show All 454 Lines • ▼ Show 20 Lines	func.func @depthwise_conv2d_dyn_w_h(%arg0: tensor<2x?x?x3xf32>, %arg1: tensor<3x6x3x5xf32>, %arg2: tensor<15xf32>) {
// CHECK: arith.addi		// CHECK: arith.addi
// CHECK: arith.subi		// CHECK: arith.subi
// CHECK: arith.muli		// CHECK: arith.muli
// CHECK: arith.divui		// CHECK: arith.divui
// CHECK: %[[PADDED:.+]] = tensor.pad %arg0 low[0, 1, 3, 0] high[0, 2, 4, 0] {		// CHECK: %[[PADDED:.+]] = tensor.pad %arg0 low[0, 1, 3, 0] high[0, 2, 4, 0] {
// CHECK: ^bb0(%[[ARG3:[0-9a-zA-Z_]+]]: index, %[[ARG4:[0-9a-zA-Z_]+]]: index, %[[ARG5:[0-9a-zA-Z_]+]]: index, %[[ARG6:[0-9a-zA-Z_]+]]: index):		// CHECK: ^bb0(%[[ARG3:[0-9a-zA-Z_]+]]: index, %[[ARG4:[0-9a-zA-Z_]+]]: index, %[[ARG5:[0-9a-zA-Z_]+]]: index, %[[ARG6:[0-9a-zA-Z_]+]]: index):
// CHECK: tensor.yield %cst : f32		// CHECK: tensor.yield %cst : f32
// CHECK: } : tensor<2x?x?x3xf32> to tensor<2x?x?x3xf32>		// CHECK: } : tensor<2x?x?x3xf32> to tensor<2x?x?x3xf32>
// CHECK: %[[CONV:.+]] = linalg.depthwise_conv_2d_nhwc_hwcm {dilations = dense<[2, 1]> : tensor<2xi64>, strides = dense<[1, 2]> : tensor<2xi64>} ins(%[[PADDED]], %arg1 : tensor<2x?x?x3xf32>, tensor<3x6x3x5xf32>) outs(%{{.*}} : tensor<2x?x?x3x5xf32>) -> tensor<2x?x?x3x5xf32>		// CHECK: %[[CONV:.+]] = linalg.depthwise_conv_2d_nhwc_hwcm {dilations = dense<[2, 1]> : tensor<2xi64>, strides = dense<[1, 2]> : tensor<2xi64>} ins(%[[PADDED]], %arg1 : tensor<2x?x?x3xf32>, tensor<3x6x3x5xf32>) outs(%{{.*}} : tensor<2x?x?x3x5xf32>)
// CHECK: %[[COLLAPSED:.+]] = tensor.collapse_shape %[[CONV]] {{\[}}[0], [1], [2], [3, 4]]		// CHECK: %[[COLLAPSED:.+]] = tensor.collapse_shape %[[CONV]] {{\[}}[0], [1], [2], [3, 4]]
%0 = "tosa.depthwise_conv2d"(%arg0, %arg1, %arg2) {pad = array<i64: 1, 2, 3, 4>, dilation = array<i64: 2, 1>, stride = array<i64: 1, 2>} : (tensor<2x?x?x3xf32>, tensor<3x6x3x5xf32>, tensor<15xf32>) -> tensor<2x?x?x15xf32>		%0 = "tosa.depthwise_conv2d"(%arg0, %arg1, %arg2) {pad = array<i64: 1, 2, 3, 4>, dilation = array<i64: 2, 1>, stride = array<i64: 1, 2>} : (tensor<2x?x?x3xf32>, tensor<3x6x3x5xf32>, tensor<15xf32>) -> tensor<2x?x?x15xf32>
return		return
}		}

// -----		// -----

// CHECK-LABEL: @conv3d_f32		// CHECK-LABEL: @conv3d_f32
func.func @conv3d_f32(%input: tensor<1x49x48x47x27xf32>, %weights: tensor<28x3x4x5x27xf32>, %bias: tensor<28xf32>) -> () {		func.func @conv3d_f32(%input: tensor<1x49x48x47x27xf32>, %weights: tensor<28x3x4x5x27xf32>, %bias: tensor<28xf32>) -> () {
// CHECK-DAG: %[[PERMS:.+]] = arith.constant dense<[1, 2, 3, 4, 0]>		// CHECK-DAG: %[[PERMS:.+]] = arith.constant dense<[1, 2, 3, 4, 0]>
// CHECK-DAG: %[[TRANSPOSE:.+]] = "tosa.transpose"(%arg1, %[[PERMS]])		// CHECK-DAG: %[[TRANSPOSE:.+]] = "tosa.transpose"(%arg1, %[[PERMS]])
// CHECK-DAG: %[[EMPTY:.+]] = tensor.empty()		// CHECK-DAG: %[[EMPTY:.+]] = tensor.empty()
// CHECK-DAG: %[[ZERO:.+]] = arith.constant 0		// CHECK-DAG: %[[ZERO:.+]] = arith.constant 0
// CHECK-DAG: %[[FILL:.+]] = linalg.fill ins(%[[ZERO]] : f32) outs(%[[EMPTY]] : tensor<1x47x45x43x28xf32>)		// CHECK-DAG: %[[FILL:.+]] = linalg.fill ins(%[[ZERO]] : f32) outs(%[[EMPTY]] : tensor<1x47x45x43x28xf32>)
// CHECK-DAG: %[[EMPTY:.+]] = tensor.empty()		// CHECK-DAG: %[[EMPTY:.+]] = tensor.empty()
// CHECK-DAG: %[[CONV3D:.+]] = linalg.conv_3d_ndhwc_dhwcf		// CHECK-DAG: %[[CONV3D:.+]] = linalg.conv_3d_ndhwc_dhwcf
// CHECK-SAME: {dilations = dense<1> : tensor<3xi64>, strides = dense<1> : tensor<3xi64>}		// CHECK-SAME: {dilations = dense<1> : tensor<3xi64>, strides = dense<1> : tensor<3xi64>}
// CHECK-SAME: ins(%arg0, %[[TRANSPOSE]] : tensor<1x49x48x47x27xf32>, tensor<3x4x5x27x28xf32>)		// CHECK-SAME: ins(%arg0, %[[TRANSPOSE]] : tensor<1x49x48x47x27xf32>, tensor<3x4x5x27x28xf32>)
// CHECK-SAME: outs(%[[FILL]] : tensor<1x47x45x43x28xf32>) -> tensor<1x47x45x43x28xf32>		// CHECK-SAME: outs(%[[FILL]] : tensor<1x47x45x43x28xf32>)
// CHECK: %[[GENERIC:.+]] = linalg.generic		// CHECK: %[[GENERIC:.+]] = linalg.generic
// CHECK-SAME: {indexing_maps = [#map, #map1, #map1], iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel"]}		// CHECK-SAME: {indexing_maps = [#map, #map1, #map1], iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel"]}
// CHECK-SAME: ins(%arg2, %[[CONV3D]] : tensor<28xf32>, tensor<1x47x45x43x28xf32>)		// CHECK-SAME: ins(%arg2, %[[CONV3D]] : tensor<28xf32>, tensor<1x47x45x43x28xf32>)
// CHECK--SAME: outs(%[[EMPTY]] : tensor<1x47x45x43x28xf32>) {		// CHECK--SAME: outs(%[[EMPTY]] : tensor<1x47x45x43x28xf32>) {
// CHECK: ^bb0(%[[A1:.+]]: f32, %[[A2:.+]]: f32, %{{.+}}: f32):		// CHECK: ^bb0(%[[A1:.+]]: f32, %[[A2:.+]]: f32, %{{.+}}: f32):
// CHECK: %[[ADD:.+]] = arith.addf %[[A1]], %[[A2]] : f32		// CHECK: %[[ADD:.+]] = arith.addf %[[A1]], %[[A2]] : f32
// CHECK: linalg.yield %[[ADD]]		// CHECK: linalg.yield %[[ADD]]
%0 = "tosa.conv3d"(%input, %weights, %bias) {pad = array<i64: 0, 0, 0, 0, 0, 0>, stride = array<i64: 1, 1, 1>, dilation = array<i64: 1, 1, 1>} : (tensor<1x49x48x47x27xf32>, tensor<28x3x4x5x27xf32>, tensor<28xf32>) -> tensor<1x47x45x43x28xf32>		%0 = "tosa.conv3d"(%input, %weights, %bias) {pad = array<i64: 0, 0, 0, 0, 0, 0>, stride = array<i64: 1, 1, 1>, dilation = array<i64: 1, 1, 1>} : (tensor<1x49x48x47x27xf32>, tensor<28x3x4x5x27xf32>, tensor<28xf32>) -> tensor<1x47x45x43x28xf32>
Show All 10 Lines	func.func @conv3d_i8(%input: tensor<1x49x48x47x27xi8>, %weights: tensor<28x3x4x5x27xi8>, %bias: tensor<28xi32>) -> () {
// CHECK-DAG: %[[ZERO:.+]] = arith.constant 0		// CHECK-DAG: %[[ZERO:.+]] = arith.constant 0
// CHECK-DAG: %[[FILL:.+]] = linalg.fill ins(%[[ZERO]] : i32) outs(%[[EMPTY]] : tensor<1x47x45x43x28xi32>)		// CHECK-DAG: %[[FILL:.+]] = linalg.fill ins(%[[ZERO]] : i32) outs(%[[EMPTY]] : tensor<1x47x45x43x28xi32>)
// CHECK-DAG: %[[EMPTY:.+]] = tensor.empty()		// CHECK-DAG: %[[EMPTY:.+]] = tensor.empty()
// CHECK-DAG: %[[IZP:.+]] = arith.constant -128 : i32		// CHECK-DAG: %[[IZP:.+]] = arith.constant -128 : i32
// CHECK-DAG: %[[FZP:.+]] = arith.constant 42 : i32		// CHECK-DAG: %[[FZP:.+]] = arith.constant 42 : i32
// CHECK-DAG: %[[CONV3D:.+]] = linalg.conv_3d_ndhwc_dhwcf_q		// CHECK-DAG: %[[CONV3D:.+]] = linalg.conv_3d_ndhwc_dhwcf_q
// CHECK-SAME: {dilations = dense<1> : tensor<3xi64>, strides = dense<1> : tensor<3xi64>}		// CHECK-SAME: {dilations = dense<1> : tensor<3xi64>, strides = dense<1> : tensor<3xi64>}
// CHECK-SAME: ins(%arg0, %[[TRANSPOSE]], %[[IZP]], %[[FZP]] : tensor<1x49x48x47x27xi8>, tensor<3x4x5x27x28xi8>, i32, i32)		// CHECK-SAME: ins(%arg0, %[[TRANSPOSE]], %[[IZP]], %[[FZP]] : tensor<1x49x48x47x27xi8>, tensor<3x4x5x27x28xi8>, i32, i32)
// CHECK-SAME: outs(%[[FILL]] : tensor<1x47x45x43x28xi32>) -> tensor<1x47x45x43x28xi32>		// CHECK-SAME: outs(%[[FILL]] : tensor<1x47x45x43x28xi32>)
// CHECK: %[[GENERIC:.+]] = linalg.generic		// CHECK: %[[GENERIC:.+]] = linalg.generic
// CHECK-SAME: {indexing_maps = [#map, #map1, #map1], iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel"]}		// CHECK-SAME: {indexing_maps = [#map, #map1, #map1], iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel"]}
// CHECK-SAME: ins(%arg2, %[[CONV3D]] : tensor<28xi32>, tensor<1x47x45x43x28xi32>)		// CHECK-SAME: ins(%arg2, %[[CONV3D]] : tensor<28xi32>, tensor<1x47x45x43x28xi32>)
// CHECK--SAME: outs(%[[EMPTY]] : tensor<1x47x45x43x28xi32>) {		// CHECK--SAME: outs(%[[EMPTY]] : tensor<1x47x45x43x28xi32>) {
// CHECK: ^bb0(%[[A1:.+]]: i32, %[[A2:.+]]: i32, %{{.+}}: i32):		// CHECK: ^bb0(%[[A1:.+]]: i32, %[[A2:.+]]: i32, %{{.+}}: i32):
// CHECK: %[[ADD:.+]] = arith.addi %[[A1]], %[[A2]] : i32		// CHECK: %[[ADD:.+]] = arith.addi %[[A1]], %[[A2]] : i32
// CHECK: linalg.yield %[[ADD]]		// CHECK: linalg.yield %[[ADD]]
%0 = "tosa.conv3d"(%input, %weights, %bias) {pad = array<i64: 0, 0, 0, 0, 0, 0>, quantization_info = #tosa.conv_quant<input_zp = -128, weight_zp = 42>, stride = array<i64: 1, 1, 1>, dilation = array<i64: 1, 1, 1>} : (tensor<1x49x48x47x27xi8>, tensor<28x3x4x5x27xi8>, tensor<28xi32>) -> tensor<1x47x45x43x28xi32>		%0 = "tosa.conv3d"(%input, %weights, %bias) {pad = array<i64: 0, 0, 0, 0, 0, 0>, quantization_info = #tosa.conv_quant<input_zp = -128, weight_zp = 42>, stride = array<i64: 1, 1, 1>, dilation = array<i64: 1, 1, 1>} : (tensor<1x49x48x47x27xi8>, tensor<28x3x4x5x27xi8>, tensor<28xi32>) -> tensor<1x47x45x43x28xi32>
return		return
}		}

mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-analysis-empty-tensor-elimination.mlir

// RUN: mlir-opt %s -eliminate-empty-tensors -empty-tensor-to-alloc-tensor -one-shot-bufferize="bufferize-function-boundaries test-analysis-only allow-return-allocs" -split-input-file \| FileCheck %s		// RUN: mlir-opt %s -eliminate-empty-tensors -empty-tensor-to-alloc-tensor -one-shot-bufferize="bufferize-function-boundaries test-analysis-only allow-return-allocs" -split-input-file \| FileCheck %s

// CHECK-LABEL: func @buffer_forwarding_conflict		// CHECK-LABEL: func @buffer_forwarding_conflict
func.func @buffer_forwarding_conflict(%arg0: tensor<?xf32> {bufferization.writable = true}, %arg1: index) -> (tensor<?xf32>, tensor<?xf32>) {		func.func @buffer_forwarding_conflict(%arg0: tensor<?xf32> {bufferization.writable = true}, %arg1: index) -> (tensor<?xf32>, tensor<?xf32>) {
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
// CHECK: tensor.extract_slice		// CHECK: tensor.extract_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["false", "none"]		// CHECK-SAME: {__inplace_operands_attr__ = ["false", "none"]
// Instead of allocating, share buffer with some inplace bufferization?		// Instead of allocating, share buffer with some inplace bufferization?
%0 = tensor.empty(%arg1) : tensor<?xf32>		%0 = tensor.empty(%arg1) : tensor<?xf32>

// CHECK: linalg.fill		// CHECK: linalg.fill
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<?xf32>) -> tensor<?xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<?xf32>)

// CHECK: tensor.insert_slice		// CHECK: tensor.insert_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "false", "none"]		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "false", "none"]
%2 = tensor.insert_slice %1 into %arg0[0] [%arg1] [1] : tensor<?xf32> into tensor<?xf32>		%2 = tensor.insert_slice %1 into %arg0[0] [%arg1] [1] : tensor<?xf32> into tensor<?xf32>

// CHECK: tensor.insert_slice		// CHECK: tensor.insert_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none"]		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none"]
%3 = tensor.insert_slice %1 into %arg0[42] [%arg1] [1] : tensor<?xf32> into tensor<?xf32>		%3 = tensor.insert_slice %1 into %arg0[42] [%arg1] [1] : tensor<?xf32> into tensor<?xf32>
Show All 10 Lines	func.func @buffer_forwarding_no_conflict(%arg0: tensor<?xf32> {bufferization.writable = true}, %arg1: index) -> (tensor<?xf32>, tensor<?xf32>) {
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
// CHECK: tensor.extract_slice		// CHECK: tensor.extract_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "none"]		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "none"]
// Instead of allocating, share buffer with some inplace bufferization?		// Instead of allocating, share buffer with some inplace bufferization?
%0 = tensor.empty(%arg1) : tensor<?xf32>		%0 = tensor.empty(%arg1) : tensor<?xf32>

// CHECK: linalg.fill		// CHECK: linalg.fill
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<?xf32>) -> tensor<?xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<?xf32>)

// CHECK: tensor.insert_slice		// CHECK: tensor.insert_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none"]		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none"]
%2 = tensor.insert_slice %1 into %arg0[42] [%arg1] [1] : tensor<?xf32> into tensor<?xf32>		%2 = tensor.insert_slice %1 into %arg0[42] [%arg1] [1] : tensor<?xf32> into tensor<?xf32>

// CHECK: return		// CHECK: return
// CHECK-SAME: __equivalent_func_args__ = [0, 0]		// CHECK-SAME: __equivalent_func_args__ = [0, 0]
return %2, %2 : tensor<?xf32>, tensor<?xf32>		return %2, %2 : tensor<?xf32>, tensor<?xf32>
}		}

mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-empty-tensor-elimination.mlir

Show All 14 Lines	func.func @buffer_forwarding_conflict(
// Alloc is needed for the first insert_slice (due to backward traversal during analysis).		// Alloc is needed for the first insert_slice (due to backward traversal during analysis).
// CHECK: %[[DIM:.*]] = memref.dim %[[FUNC_ARG]]		// CHECK: %[[DIM:.*]] = memref.dim %[[FUNC_ARG]]
// This allocs the whole dim to allow for a full clone of t.		// This allocs the whole dim to allow for a full clone of t.
// CHECK: %[[ALLOC:.*]] = memref.alloc(%[[DIM]])		// CHECK: %[[ALLOC:.*]] = memref.alloc(%[[DIM]])
// tensor.empty itself does not alloc but forwards to the second		// tensor.empty itself does not alloc but forwards to the second
// insert_slice. The pass replaces the tensor.empty with an out-of-place		// insert_slice. The pass replaces the tensor.empty with an out-of-place
// extract_slice.		// extract_slice.
%a = tensor.empty(%sz) : tensor<?xf32>		%a = tensor.empty(%sz) : tensor<?xf32>
%f = linalg.fill ins(%f0 : f32) outs(%a : tensor<?xf32>) -> tensor<?xf32>		%f = linalg.fill ins(%f0 : f32) outs(%a : tensor<?xf32>)

// CHECK: memref.copy %[[FUNC_ARG]], %[[ALLOC]] : memref<?xf32> to memref<?xf32>		// CHECK: memref.copy %[[FUNC_ARG]], %[[ALLOC]] : memref<?xf32> to memref<?xf32>
// CHECK: %[[SV0_ALLOC:.*]] = memref.subview %[[ALLOC]][0] [%[[sz]]] [1] : memref<?xf32> to memref<?xf32, strided<[1]>>		// CHECK: %[[SV0_ALLOC:.*]] = memref.subview %[[ALLOC]][0] [%[[sz]]] [1] : memref<?xf32> to memref<?xf32, strided<[1]>>
// CHECK: memref.copy %[[EXTRACT_SLICE_ALLOC]], %[[SV0_ALLOC]] : memref<?xf32> to memref<?xf32, strided<[1]>>		// CHECK: memref.copy %[[EXTRACT_SLICE_ALLOC]], %[[SV0_ALLOC]] : memref<?xf32> to memref<?xf32, strided<[1]>>
%r0 = tensor.insert_slice %f into %t[0][%sz][1]: tensor<?xf32> into tensor<?xf32>		%r0 = tensor.insert_slice %f into %t[0][%sz][1]: tensor<?xf32> into tensor<?xf32>

// CHECK: %[[T_SUBVIEW:.*]] = memref.subview %[[FUNC_ARG]][42] [%[[sz]]] [1]		// CHECK: %[[T_SUBVIEW:.*]] = memref.subview %[[FUNC_ARG]][42] [%[[sz]]] [1]
// CHECK: memref.copy %[[EXTRACT_SLICE_ALLOC]], %[[T_SUBVIEW]]		// CHECK: memref.copy %[[EXTRACT_SLICE_ALLOC]], %[[T_SUBVIEW]]
Show All 16 Lines	func.func @buffer_forwarding_no_conflict(

// tensor.empty itself does not alloc but forwards to the insert_slice.		// tensor.empty itself does not alloc but forwards to the insert_slice.
// EmptyTensorOpElimination replaces the tensor.empty with an inplace		// EmptyTensorOpElimination replaces the tensor.empty with an inplace
// extract_slice.		// extract_slice.
// CHECK: %[[T_SUBVIEW:.*]] = memref.subview %[[FUNC_ARG]][42] [%[[sz]]] [1]		// CHECK: %[[T_SUBVIEW:.*]] = memref.subview %[[FUNC_ARG]][42] [%[[sz]]] [1]
%a = tensor.empty(%sz) : tensor<?xf32>		%a = tensor.empty(%sz) : tensor<?xf32>

// CHECK: linalg.fill ins({{.*}} : f32) outs(%[[T_SUBVIEW]] : memref<?xf32		// CHECK: linalg.fill ins({{.*}} : f32) outs(%[[T_SUBVIEW]] : memref<?xf32
%f = linalg.fill ins(%f0 : f32) outs(%a : tensor<?xf32>) -> tensor<?xf32>		%f = linalg.fill ins(%f0 : f32) outs(%a : tensor<?xf32>)

// Self-copy canonicalizes away later.		// Self-copy canonicalizes away later.
%r1 = tensor.insert_slice %f into %t[42][%sz][1]: tensor<?xf32> into tensor<?xf32>		%r1 = tensor.insert_slice %f into %t[42][%sz][1]: tensor<?xf32> into tensor<?xf32>

return %r1: tensor<?xf32>		return %r1: tensor<?xf32>
}		}

// -----		// -----
Show All 10 Lines	func.func @insertion_point_inside_loop(%t : tensor<?xf32>, %sz : index) -> (tensor<?xf32>) {

// CHECK: scf.for %[[iv:.]] = %{{.}} to %[[sz]] step %{{.*}} {		// CHECK: scf.for %[[iv:.]] = %{{.}} to %[[sz]] step %{{.*}} {
%r = scf.for %iv = %c0 to %sz step %c5 iter_args(%bb = %t) -> (tensor<?xf32>) {		%r = scf.for %iv = %c0 to %sz step %c5 iter_args(%bb = %t) -> (tensor<?xf32>) {
// CHECK: %[[subview:.*]] = memref.subview %[[t]][%[[iv]]] [5] [1]		// CHECK: %[[subview:.*]] = memref.subview %[[t]][%[[iv]]] [5] [1]
%iv_i32 = arith.index_cast %iv : index to i32		%iv_i32 = arith.index_cast %iv : index to i32
%f = arith.sitofp %iv_i32 : i32 to f32		%f = arith.sitofp %iv_i32 : i32 to f32

// CHECK: linalg.fill ins(%{{.}}{{.}}outs(%[[subview]]		// CHECK: linalg.fill ins(%{{.}}{{.}}outs(%[[subview]]
%filled = linalg.fill ins(%f : f32) outs(%blank : tensor<5xf32>) -> tensor<5xf32>		%filled = linalg.fill ins(%f : f32) outs(%blank : tensor<5xf32>)

// CHECK-NOT: memref.copy		// CHECK-NOT: memref.copy
%inserted = tensor.insert_slice %filled into %bb[%iv][5][1] : tensor<5xf32> into tensor<?xf32>		%inserted = tensor.insert_slice %filled into %bb[%iv][5][1] : tensor<5xf32> into tensor<?xf32>
scf.yield %inserted : tensor<?xf32>		scf.yield %inserted : tensor<?xf32>
}		}

return %r : tensor<?xf32>		return %r : tensor<?xf32>
}		}
Show All 13 Lines	func.func @insertion_point_outside_loop(%t : tensor<?xf32>, %sz : index,
%blank = tensor.empty() : tensor<5xf32>		%blank = tensor.empty() : tensor<5xf32>

// CHECK: scf.for %[[iv:.]] = %{{.}} to %[[sz]] step %{{.*}} {		// CHECK: scf.for %[[iv:.]] = %{{.}} to %[[sz]] step %{{.*}} {
%r = scf.for %iv = %c0 to %sz step %c5 iter_args(%bb = %t) -> (tensor<?xf32>) {		%r = scf.for %iv = %c0 to %sz step %c5 iter_args(%bb = %t) -> (tensor<?xf32>) {
%iv_i32 = arith.index_cast %iv : index to i32		%iv_i32 = arith.index_cast %iv : index to i32
%f = arith.sitofp %iv_i32 : i32 to f32		%f = arith.sitofp %iv_i32 : i32 to f32

// CHECK: linalg.fill ins(%{{.}}{{.}}outs(%[[subview]]		// CHECK: linalg.fill ins(%{{.}}{{.}}outs(%[[subview]]
%filled = linalg.fill ins(%f : f32) outs(%blank : tensor<5xf32>) -> tensor<5xf32>		%filled = linalg.fill ins(%f : f32) outs(%blank : tensor<5xf32>)

// CHECK-NOT: memref.copy		// CHECK-NOT: memref.copy
%inserted = tensor.insert_slice %filled into %bb[%idx][5][1] : tensor<5xf32> into tensor<?xf32>		%inserted = tensor.insert_slice %filled into %bb[%idx][5][1] : tensor<5xf32> into tensor<?xf32>
scf.yield %inserted : tensor<?xf32>		scf.yield %inserted : tensor<?xf32>
}		}

return %r : tensor<?xf32>		return %r : tensor<?xf32>
}		}

// -----		// -----

// EmptyTensorElimination does currently not apply to chains where the type is		// EmptyTensorElimination does currently not apply to chains where the type is
// changing. This test just ensures that we do not crash or generate IR that		// changing. This test just ensures that we do not crash or generate IR that
// does not verify.		// does not verify.

// CHECK-LABEL: func @shape_mismatch		// CHECK-LABEL: func @shape_mismatch
func.func @shape_mismatch(%t: tensor<5x6x128xf32>) -> tensor<5x6x128xf32> {		func.func @shape_mismatch(%t: tensor<5x6x128xf32>) -> tensor<5x6x128xf32> {
%cst = arith.constant 8.0 : f32		%cst = arith.constant 8.0 : f32
%0 = tensor.empty() : tensor<128xf32>		%0 = tensor.empty() : tensor<128xf32>
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<128xf32>) -> tensor<128xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<128xf32>)
%2 = tensor.expand_shape %1 [[0, 1, 2]]		%2 = tensor.expand_shape %1 [[0, 1, 2]]
: tensor<128xf32> into tensor<1x1x128xf32>		: tensor<128xf32> into tensor<1x1x128xf32>
%3 = tensor.insert_slice %2 into %t[2, 3, 0][1, 1, 128][1, 1, 1]		%3 = tensor.insert_slice %2 into %t[2, 3, 0][1, 1, 128][1, 1, 1]
: tensor<1x1x128xf32> into tensor<5x6x128xf32>		: tensor<1x1x128xf32> into tensor<5x6x128xf32>
return %3 : tensor<5x6x128xf32>		return %3 : tensor<5x6x128xf32>
}		}

// -----		// -----
Show All 12 Lines	func.func @parallel_insert_slice(
%r1 = scf.foreach_thread (%iv) in (%c512) shared_outs(%o = %t) -> (tensor<?xf32>) {		%r1 = scf.foreach_thread (%iv) in (%c512) shared_outs(%o = %t) -> (tensor<?xf32>) {
// tensor.empty itself does not alloc but forwards to the insert_slice.		// tensor.empty itself does not alloc but forwards to the insert_slice.
// EmptyTensorOpElimination replaces the tensor.empty with an inplace		// EmptyTensorOpElimination replaces the tensor.empty with an inplace
// extract_slice.		// extract_slice.
// CHECK: %[[T_SUBVIEW:.*]] = memref.subview %[[FUNC_ARG]][42] [%[[sz]]] [1]		// CHECK: %[[T_SUBVIEW:.*]] = memref.subview %[[FUNC_ARG]][42] [%[[sz]]] [1]
%a = tensor.empty(%sz) : tensor<?xf32>		%a = tensor.empty(%sz) : tensor<?xf32>

// CHECK: linalg.fill ins({{.*}} : f32) outs(%[[T_SUBVIEW]] : memref<?xf32		// CHECK: linalg.fill ins({{.*}} : f32) outs(%[[T_SUBVIEW]] : memref<?xf32
%f = linalg.fill ins(%f0 : f32) outs(%a : tensor<?xf32>) -> tensor<?xf32>		%f = linalg.fill ins(%f0 : f32) outs(%a : tensor<?xf32>)

// Self-copy canonicalizes away later.		// Self-copy canonicalizes away later.
scf.foreach_thread.perform_concurrently {		scf.foreach_thread.perform_concurrently {
tensor.parallel_insert_slice %f into %o[42][%sz][1]: tensor<?xf32> into tensor<?xf32>		tensor.parallel_insert_slice %f into %o[42][%sz][1]: tensor<?xf32> into tensor<?xf32>
}		}
}		}

return %r1: tensor<?xf32>		return %r1: tensor<?xf32>
}		}

mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-partial.mlir

Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	func.func @unknown_op_may_read(%v: vector<5xf32>)
%cst = arith.constant 5.0 : f32		%cst = arith.constant 5.0 : f32

// One alloc for the alloc_tensor, another one because the transfer_write		// One alloc for the alloc_tensor, another one because the transfer_write
// bufferizes out-of-place.		// bufferizes out-of-place.
// CHECK: %[[m1:.]] = memref.alloc() {{.}} : memref<10xf32>		// CHECK: %[[m1:.]] = memref.alloc() {{.}} : memref<10xf32>
// CHECK: linalg.fill ins(%{{.}}{{.}}outs(%[[m1]]		// CHECK: linalg.fill ins(%{{.}}{{.}}outs(%[[m1]]
// CHECK: %[[filled_tensor:.*]] = bufferization.to_tensor %[[m1]]		// CHECK: %[[filled_tensor:.*]] = bufferization.to_tensor %[[m1]]
%t1 = bufferization.alloc_tensor() : tensor<10xf32>		%t1 = bufferization.alloc_tensor() : tensor<10xf32>
%filled = linalg.fill ins(%cst : f32) outs(%t1 : tensor<10xf32>) -> tensor<10xf32>		%filled = linalg.fill ins(%cst : f32) outs(%t1 : tensor<10xf32>)

// The transfer_write is out-of-place because "dummy_op" may read.		// The transfer_write is out-of-place because "dummy_op" may read.
// CHECK: %[[alloc:.]] = memref.alloc() {{.}} : memref<10xf32>		// CHECK: %[[alloc:.]] = memref.alloc() {{.}} : memref<10xf32>
// CHECK: memref.copy %[[m1]], %[[alloc]]		// CHECK: memref.copy %[[m1]], %[[alloc]]
// CHECK: vector.transfer_write %{{.*}}, %[[alloc]]		// CHECK: vector.transfer_write %{{.*}}, %[[alloc]]
// CHECK: %[[alloc_tensor:.*]] = bufferization.to_tensor %[[alloc]]		// CHECK: %[[alloc_tensor:.*]] = bufferization.to_tensor %[[alloc]]
%1 = vector.transfer_write %v, %filled[%idx] : vector<5xf32>, tensor<10xf32>		%1 = vector.transfer_write %v, %filled[%idx] : vector<5xf32>, tensor<10xf32>

▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize-allow-return-allocs.mlir

	Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	// CHECK: %[[alloc:.*]] = memref.alloc			// CHECK: %[[alloc:.*]] = memref.alloc
	// CHECK: memref.copy %[[call]], %[[alloc]]			// CHECK: memref.copy %[[call]], %[[alloc]]
	// CHECK: linalg.fill ins({{.*}}) outs(%[[t]]			// CHECK: linalg.fill ins({{.*}}) outs(%[[t]]
	// CHECK: memref.load %[[alloc]]			// CHECK: memref.load %[[alloc]]
	// CHECK: memref.load %[[t]]			// CHECK: memref.load %[[t]]
	func.func @main(%t: tensor<?xf32>, %sz: index, %idx: index) -> (f32, f32) {			func.func @main(%t: tensor<?xf32>, %sz: index, %idx: index) -> (f32, f32) {
	%cst = arith.constant 1.0 : f32			%cst = arith.constant 1.0 : f32
	%0 = call @return_slice(%t, %sz) : (tensor<?xf32>, index) -> (tensor<?xf32>)			%0 = call @return_slice(%t, %sz) : (tensor<?xf32>, index) -> (tensor<?xf32>)
	%filled = linalg.fill ins(%cst : f32) outs(%t : tensor<?xf32>) -> tensor<?xf32>			%filled = linalg.fill ins(%cst : f32) outs(%t : tensor<?xf32>)
	%r1 = tensor.extract %0[%idx] : tensor<?xf32>			%r1 = tensor.extract %0[%idx] : tensor<?xf32>
	%r2 = tensor.extract %filled[%idx] : tensor<?xf32>			%r2 = tensor.extract %filled[%idx] : tensor<?xf32>
	return %r1, %r2 : f32, f32			return %r1, %r2 : f32, f32
	}			}

	// -----			// -----

	func.func @return_arg(%A: tensor<?xf32>) -> tensor<?xf32> {			func.func @return_arg(%A: tensor<?xf32>) -> tensor<?xf32> {
	Show All 9 Lines

mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize-analysis.mlir

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
// CHECK-SAME: bufferization.access = "read-write"		// CHECK-SAME: bufferization.access = "read-write"
-> (tensor<4x4xf32>, tensor<4x4xf32>, tensor<4x4xf32>)		-> (tensor<4x4xf32>, tensor<4x4xf32>, tensor<4x4xf32>)
{		{
// matmul output operand interferes with input operand.		// matmul output operand interferes with input operand.
// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "false"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "false"]}
%C = linalg.matmul ins(%A, %B: tensor<4x4xf32>, tensor<4x4xf32>)		%C = linalg.matmul ins(%A, %B: tensor<4x4xf32>, tensor<4x4xf32>)
outs(%B: tensor<4x4xf32>)		outs(%B: tensor<4x4xf32>)
-> tensor<4x4xf32>

// matmul output operand interferes with input operand.		// matmul output operand interferes with input operand.
// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "false"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "false"]}
%D = linalg.matmul ins(%B, %A: tensor<4x4xf32>, tensor<4x4xf32>)		%D = linalg.matmul ins(%B, %A: tensor<4x4xf32>, tensor<4x4xf32>)
outs(%B: tensor<4x4xf32>)		outs(%B: tensor<4x4xf32>)
-> tensor<4x4xf32>

// matmul output operand does not interferes with input operand.		// matmul output operand does not interferes with input operand.
// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "true"]}
%E = linalg.matmul ins(%A, %A: tensor<4x4xf32>, tensor<4x4xf32>)		%E = linalg.matmul ins(%A, %A: tensor<4x4xf32>, tensor<4x4xf32>)
outs(%B: tensor<4x4xf32>)		outs(%B: tensor<4x4xf32>)
-> tensor<4x4xf32>

// CHECK: return		// CHECK: return
// CHECK-SAME: __equivalent_func_args__ = [-1, -1, 1]		// CHECK-SAME: __equivalent_func_args__ = [-1, -1, 1]
return %C, %D, %E: tensor<4x4xf32>, tensor<4x4xf32>, tensor<4x4xf32>		return %C, %D, %E: tensor<4x4xf32>, tensor<4x4xf32>, tensor<4x4xf32>
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Length-1 producer-consumer cases.		// Length-1 producer-consumer cases.
▲ Show 20 Lines • Show All 153 Lines • ▼ Show 20 Lines	func.func @read_of_matching_insert_slice_source(
%cst2 = arith.constant 1.0 : f32		%cst2 = arith.constant 1.0 : f32

// CHECK: tensor.extract_slice		// CHECK: tensor.extract_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "none", "none"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "none", "none"]}
%0 = tensor.extract_slice %A[%idx][%idx][1] : tensor<?xf32> to tensor<?xf32>		%0 = tensor.extract_slice %A[%idx][%idx][1] : tensor<?xf32> to tensor<?xf32>

// CHECK: linalg.fill		// CHECK: linalg.fill
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<?xf32>) -> tensor<?xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<?xf32>)

// CHECK: tensor.insert_slice		// CHECK: tensor.insert_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none", "none"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none", "none"]}
%2 = tensor.insert_slice %1 into %A[%idx][%idx][1] : tensor<?xf32> into tensor<?xf32>		%2 = tensor.insert_slice %1 into %A[%idx][%idx][1] : tensor<?xf32> into tensor<?xf32>

%3 = vector.transfer_read %1[%idx2], %cst2 : tensor<?xf32>, vector<5xf32>		%3 = vector.transfer_read %1[%idx2], %cst2 : tensor<?xf32>, vector<5xf32>

// CHECK: return		// CHECK: return
Show All 15 Lines	func.func @read_of_matching_insert_slice_source_interleaved(
%cst2 = arith.constant 1.0 : f32		%cst2 = arith.constant 1.0 : f32

// CHECK: tensor.extract_slice		// CHECK: tensor.extract_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["false", "none", "none"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["false", "none", "none"]}
%0 = tensor.extract_slice %A[%idx][%idx][1] : tensor<?xf32> to tensor<?xf32>		%0 = tensor.extract_slice %A[%idx][%idx][1] : tensor<?xf32> to tensor<?xf32>

// CHECK: linalg.fill		// CHECK: linalg.fill
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<?xf32>) -> tensor<?xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<?xf32>)

// CHECK: tensor.insert_slice		// CHECK: tensor.insert_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none", "none"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none", "none"]}
%2 = tensor.insert_slice %1 into %A[%idx][%idx][1] : tensor<?xf32> into tensor<?xf32>		%2 = tensor.insert_slice %1 into %A[%idx][%idx][1] : tensor<?xf32> into tensor<?xf32>

// CHECK: tensor.extract_slice		// CHECK: tensor.extract_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "none", "none"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "none", "none"]}
%4 = tensor.extract_slice %2[%idx3][%idx3][1] : tensor<?xf32> to tensor<?xf32>		%4 = tensor.extract_slice %2[%idx3][%idx3][1] : tensor<?xf32> to tensor<?xf32>

// CHECK: linalg.fill		// CHECK: linalg.fill
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]}
%5 = linalg.fill ins(%cst : f32) outs(%4 : tensor<?xf32>) -> tensor<?xf32>		%5 = linalg.fill ins(%cst : f32) outs(%4 : tensor<?xf32>)

%3 = vector.transfer_read %1[%idx2], %cst2 : tensor<?xf32>, vector<5xf32>		%3 = vector.transfer_read %1[%idx2], %cst2 : tensor<?xf32>, vector<5xf32>

// CHECK: tensor.insert_slice		// CHECK: tensor.insert_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none", "none"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none", "none"]}
%6 = tensor.insert_slice %5 into %2[%idx3][%idx3][1] : tensor<?xf32> into tensor<?xf32>		%6 = tensor.insert_slice %5 into %2[%idx3][%idx3][1] : tensor<?xf32> into tensor<?xf32>

// CHECK: return		// CHECK: return
Show All 16 Lines	func.func @extract_slice_linalg_readonly_use(
// CHECK-SAME: {__inplace_operands_attr__ = ["true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true"]}
%sA = tensor.extract_slice %A[0, 0][4, 4][1, 1] : tensor<?x?xf32> to tensor<4x4xf32>		%sA = tensor.extract_slice %A[0, 0][4, 4][1, 1] : tensor<?x?xf32> to tensor<4x4xf32>

// matmul output operand is not inplaceable at the function boundary.		// matmul output operand is not inplaceable at the function boundary.
// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "false"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "false"]}
%D = linalg.matmul ins(%sA, %B: tensor<4x4xf32>, tensor<4x4xf32>)		%D = linalg.matmul ins(%sA, %B: tensor<4x4xf32>, tensor<4x4xf32>)
outs(%B: tensor<4x4xf32>)		outs(%B: tensor<4x4xf32>)
-> tensor<4x4xf32>

// matmul output operand is inplaceable at the function boundary.		// matmul output operand is inplaceable at the function boundary.
// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "true"]}
%E = linalg.matmul ins(%sA, %B: tensor<4x4xf32>, tensor<4x4xf32>)		%E = linalg.matmul ins(%sA, %B: tensor<4x4xf32>, tensor<4x4xf32>)
outs(%C: tensor<4x4xf32>)		outs(%C: tensor<4x4xf32>)
-> tensor<4x4xf32>

// CHECK: return		// CHECK: return
// CHECK-SAME: __equivalent_func_args__ = [-1, 2]		// CHECK-SAME: __equivalent_func_args__ = [-1, 2]
return %D, %E: tensor<4x4xf32>, tensor<4x4xf32>		return %D, %E: tensor<4x4xf32>, tensor<4x4xf32>
}		}

// -----		// -----

Show All 10 Lines	func.func @extract_slice_to_linalg_write_use(
// CHECK-SAME: {__inplace_operands_attr__ = ["true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true"]}
%sB = tensor.extract_slice %B[0, 0][4, 4][1, 1] : tensor<?x?xf32> to tensor<4x4xf32>		%sB = tensor.extract_slice %B[0, 0][4, 4][1, 1] : tensor<?x?xf32> to tensor<4x4xf32>

// Step 3. %sB has a read interference in %E, it does not bufferize inplace.		// Step 3. %sB has a read interference in %E, it does not bufferize inplace.
// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "false"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "false"]}
%D = linalg.matmul ins(%B, %C: tensor<?x?xf32>, tensor<?x?xf32>)		%D = linalg.matmul ins(%B, %C: tensor<?x?xf32>, tensor<?x?xf32>)
outs(%sB: tensor<4x4xf32>)		outs(%sB: tensor<4x4xf32>)
-> tensor<4x4xf32>

// Step 2. %sC forward propagates to an inplace write in %E.		// Step 2. %sC forward propagates to an inplace write in %E.
// %sC backward propagates to %C which is inplaceable.		// %sC backward propagates to %C which is inplaceable.
// As a consequence this is bufferized inplace.		// As a consequence this is bufferized inplace.
// CHECK: tensor.extract_slice		// CHECK: tensor.extract_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true"]}
%sC = tensor.extract_slice %C[0, 0][4, 4][1, 1] : tensor<?x?xf32> to tensor<4x4xf32>		%sC = tensor.extract_slice %C[0, 0][4, 4][1, 1] : tensor<?x?xf32> to tensor<4x4xf32>

// Step 1. %sC backprops to the tensor.extract_slice producer which is not		// Step 1. %sC backprops to the tensor.extract_slice producer which is not
// considered an interference. This bufferizes inplace.		// considered an interference. This bufferizes inplace.
// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "true"]}
%E = linalg.matmul ins(%A, %sB: tensor<4x4xf32>, tensor<4x4xf32>)		%E = linalg.matmul ins(%A, %sB: tensor<4x4xf32>, tensor<4x4xf32>)
outs(%sC: tensor<4x4xf32>)		outs(%sC: tensor<4x4xf32>)
-> tensor<4x4xf32>

return %D, %E: tensor<4x4xf32>, tensor<4x4xf32>		return %D, %E: tensor<4x4xf32>, tensor<4x4xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @insert_slice_double_extract_slice		// CHECK-LABEL: func @insert_slice_double_extract_slice
func.func @insert_slice_double_extract_slice(		func.func @insert_slice_double_extract_slice(
%s1: index,		%s1: index,
%s2: index,		%s2: index,
%s3: index,		%s3: index,
%s4: index,		%s4: index,
%A: tensor<8x6xf32> {bufferization.writable = false},		%A: tensor<8x6xf32> {bufferization.writable = false},
%B: tensor<6x6xf32> {bufferization.writable = false},		%B: tensor<6x6xf32> {bufferization.writable = false},
%C: tensor<30x20xf32> {bufferization.writable = true})		%C: tensor<30x20xf32> {bufferization.writable = true})
-> tensor<30x20xf32>		-> tensor<30x20xf32>
{		{
// CHECK: tensor.extract_slice		// CHECK: tensor.extract_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "none", "none", "none", "none"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "none", "none", "none", "none"]}
%15 = tensor.extract_slice %C[%s3, %s4] [%s1, %s2] [1, 1] : tensor<30x20xf32> to tensor<?x?xf32>		%15 = tensor.extract_slice %C[%s3, %s4] [%s1, %s2] [1, 1] : tensor<30x20xf32> to tensor<?x?xf32>

// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "true"]}
%18 = linalg.matmul ins(%A, %B : tensor<8x6xf32>, tensor<6x6xf32>) outs(%15 : tensor<?x?xf32>) -> tensor<?x?xf32>		%18 = linalg.matmul ins(%A, %B : tensor<8x6xf32>, tensor<6x6xf32>) outs(%15 : tensor<?x?xf32>)

// CHECK: tensor.extract_slice		// CHECK: tensor.extract_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "none", "none"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "none", "none"]}
%19 = tensor.extract_slice %18[0, 0] [%s1, %s2] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>		%19 = tensor.extract_slice %18[0, 0] [%s1, %s2] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>

// CHECK: tensor.insert_slice		// CHECK: tensor.insert_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none", "none", "none", "none"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none", "none", "none", "none"]}
%20 = tensor.insert_slice %19 into %C[%s3, %s4] [%s1, %s2] [1, 1] : tensor<?x?xf32> into tensor<30x20xf32>		%20 = tensor.insert_slice %19 into %C[%s3, %s4] [%s1, %s2] [1, 1] : tensor<?x?xf32> into tensor<30x20xf32>
Show All 24 Lines	func.func @extract_slice_to_linalg_write_use(
%sB = tensor.extract_slice %B[0, 0][4, 4][1, 1] : tensor<?x?xf32> to tensor<4x4xf32>		%sB = tensor.extract_slice %B[0, 0][4, 4][1, 1] : tensor<?x?xf32> to tensor<4x4xf32>

// Step 3. %sB backprops to the tensor.extract_slice producer which is not		// Step 3. %sB backprops to the tensor.extract_slice producer which is not
// considered an interference. This bufferizes inplace.		// considered an interference. This bufferizes inplace.
// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "true"]}
%D = linalg.matmul ins(%B, %C: tensor<?x?xf32>, tensor<?x?xf32>)		%D = linalg.matmul ins(%B, %C: tensor<?x?xf32>, tensor<?x?xf32>)
outs(%sB: tensor<4x4xf32>)		outs(%sB: tensor<4x4xf32>)
-> tensor<4x4xf32>

// Step 2. %sC forward propagates to an inplace write in %E.		// Step 2. %sC forward propagates to an inplace write in %E.
// %sC backward propagates to %C which is inplaceable.		// %sC backward propagates to %C which is inplaceable.
// As a consequence this is bufferized inplace.		// As a consequence this is bufferized inplace.
// CHECK: tensor.extract_slice		// CHECK: tensor.extract_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true"]}
%sC = tensor.extract_slice %C[0, 0][4, 4][1, 1] : tensor<?x?xf32> to tensor<4x4xf32>		%sC = tensor.extract_slice %C[0, 0][4, 4][1, 1] : tensor<?x?xf32> to tensor<4x4xf32>

// Step 1. %sC backprops to the tensor.extract_slice producer which is not		// Step 1. %sC backprops to the tensor.extract_slice producer which is not
// considered an interference. This bufferizes inplace.		// considered an interference. This bufferizes inplace.
// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "true"]}
%E = linalg.matmul ins(%A, %A: tensor<4x4xf32>, tensor<4x4xf32>)		%E = linalg.matmul ins(%A, %A: tensor<4x4xf32>, tensor<4x4xf32>)
outs(%sC: tensor<4x4xf32>)		outs(%sC: tensor<4x4xf32>)
-> tensor<4x4xf32>

return %D, %E: tensor<4x4xf32>, tensor<4x4xf32>		return %D, %E: tensor<4x4xf32>, tensor<4x4xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @nested_extract_slice_and_insert		// CHECK-LABEL: func @nested_extract_slice_and_insert
func.func @nested_extract_slice_and_insert(		func.func @nested_extract_slice_and_insert(
Show All 21 Lines	func.func @nested_extract_slice_and_insert(
// CHECK-NEXT: fill		// CHECK-NEXT: fill
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]}
// CHECK-NEXT: tensor.insert_slice		// CHECK-NEXT: tensor.insert_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true"]}
// CHECK-NEXT: tensor.insert_slice		// CHECK-NEXT: tensor.insert_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "false", "none", "none"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "false", "none", "none"]}
%sA = tensor.extract_slice %A[0, 0][%idx, %idx][1, 1] : tensor<?x?xf32> to tensor<?x?xf32>		%sA = tensor.extract_slice %A[0, 0][%idx, %idx][1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
%ssA = tensor.extract_slice %sA[0, 0][4, 4][1, 1] : tensor<?x?xf32> to tensor<4x4xf32>		%ssA = tensor.extract_slice %sA[0, 0][4, 4][1, 1] : tensor<?x?xf32> to tensor<4x4xf32>
%FA = linalg.fill ins(%f0 : f32) outs(%ssA : tensor<4x4xf32>) -> tensor<4x4xf32>		%FA = linalg.fill ins(%f0 : f32) outs(%ssA : tensor<4x4xf32>)
%rsA = tensor.insert_slice %FA into %sA[0, 0][4, 4][1, 1] : tensor<4x4xf32> into tensor<?x?xf32>		%rsA = tensor.insert_slice %FA into %sA[0, 0][4, 4][1, 1] : tensor<4x4xf32> into tensor<?x?xf32>
%rA = tensor.insert_slice %rsA into %A[0, 0][%idx, %idx][1, 1] : tensor<?x?xf32> into tensor<?x?xf32>		%rA = tensor.insert_slice %rsA into %A[0, 0][%idx, %idx][1, 1] : tensor<?x?xf32> into tensor<?x?xf32>

// 3-level matching tensor.extract_slice / tensor.insert_slice into		// 3-level matching tensor.extract_slice / tensor.insert_slice into
// inplaceable %B.		// inplaceable %B.
// CHECK-NEXT: tensor.extract_slice		// CHECK-NEXT: tensor.extract_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "none", "none"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "none", "none"]}
// CHECK-NEXT: tensor.extract_slice		// CHECK-NEXT: tensor.extract_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "none"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "none"]}
// CHECK-NEXT: tensor.extract_slice		// CHECK-NEXT: tensor.extract_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true"]}
// CHECK-NEXT: fill		// CHECK-NEXT: fill
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]}
// CHECK-NEXT: tensor.insert_slice		// CHECK-NEXT: tensor.insert_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true"]}
// CHECK-NEXT: tensor.insert_slice		// CHECK-NEXT: tensor.insert_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none"]}
// CHECK-NEXT: tensor.insert_slice		// CHECK-NEXT: tensor.insert_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none", "none"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none", "none"]}
%sB = tensor.extract_slice %B[0, 0][%idx, %idx][1, 1] : tensor<?x?xf32> to tensor<?x?xf32>		%sB = tensor.extract_slice %B[0, 0][%idx, %idx][1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
%ssB = tensor.extract_slice %sB[0, 0][4, %idx][1, 1] : tensor<?x?xf32> to tensor<4x?xf32>		%ssB = tensor.extract_slice %sB[0, 0][4, %idx][1, 1] : tensor<?x?xf32> to tensor<4x?xf32>
%sssB = tensor.extract_slice %ssB[0, 0][4, 4][1, 1] : tensor<4x?xf32> to tensor<4x4xf32>		%sssB = tensor.extract_slice %ssB[0, 0][4, 4][1, 1] : tensor<4x?xf32> to tensor<4x4xf32>
%FB = linalg.fill ins(%f0 : f32) outs(%sssB : tensor<4x4xf32>) -> tensor<4x4xf32>		%FB = linalg.fill ins(%f0 : f32) outs(%sssB : tensor<4x4xf32>)
%rssB = tensor.insert_slice %FB into %ssB[0, 0][4, 4][1, 1] : tensor<4x4xf32> into tensor<4x?xf32>		%rssB = tensor.insert_slice %FB into %ssB[0, 0][4, 4][1, 1] : tensor<4x4xf32> into tensor<4x?xf32>
%rsB = tensor.insert_slice %rssB into %sB[0, 0][4, %idx][1, 1] : tensor<4x?xf32> into tensor<?x?xf32>		%rsB = tensor.insert_slice %rssB into %sB[0, 0][4, %idx][1, 1] : tensor<4x?xf32> into tensor<?x?xf32>
%rB = tensor.insert_slice %rsB into %B[0, 0][%idx, %idx][1, 1] : tensor<?x?xf32> into tensor<?x?xf32>		%rB = tensor.insert_slice %rsB into %B[0, 0][%idx, %idx][1, 1] : tensor<?x?xf32> into tensor<?x?xf32>

// 2-level matching tensor.extract_slice / tensor.insert_slice into		// 2-level matching tensor.extract_slice / tensor.insert_slice into
// inplaceable %C with a twist.		// inplaceable %C with a twist.
// Throw a wrench in the system: %rsC production sizes do not match %ssC.		// Throw a wrench in the system: %rsC production sizes do not match %ssC.
// CHECK-NEXT: tensor.extract_slice		// CHECK-NEXT: tensor.extract_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "none", "none"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "none", "none"]}
// The tensor.insert_slice that would be candidate for matching does not actually		// The tensor.insert_slice that would be candidate for matching does not actually
// match. That tensor.insert_slice can still be bufferized inplace nonetheless		// match. That tensor.insert_slice can still be bufferized inplace nonetheless
// but this tensor.extract_slice, which bufferizes to an inplace write, cannot.		// but this tensor.extract_slice, which bufferizes to an inplace write, cannot.
// CHECK-NEXT: tensor.extract_slice		// CHECK-NEXT: tensor.extract_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["false", "none"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["false", "none"]}
// CHECK-NEXT: fill		// CHECK-NEXT: fill
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]}
// CHECK-NEXT: tensor.insert_slice		// CHECK-NEXT: tensor.insert_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none"]}
// CHECK-NEXT: tensor.insert_slice		// CHECK-NEXT: tensor.insert_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none", "none"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none", "none"]}
%sC = tensor.extract_slice %C[0, 0][%idx, %idx][1, 1] : tensor<?x?xf32> to tensor<?x?xf32>		%sC = tensor.extract_slice %C[0, 0][%idx, %idx][1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
%ssC = tensor.extract_slice %sC[0, 0][%sz1, 4][1, 1] : tensor<?x?xf32> to tensor<?x4xf32>		%ssC = tensor.extract_slice %sC[0, 0][%sz1, 4][1, 1] : tensor<?x?xf32> to tensor<?x4xf32>
%FC = linalg.fill ins(%f0 : f32) outs(%ssC : tensor<?x4xf32>) -> tensor<?x4xf32>		%FC = linalg.fill ins(%f0 : f32) outs(%ssC : tensor<?x4xf32>)
%rsC = tensor.insert_slice %FC into %sC[0, 0][%sz2, 4][1, 1] : tensor<?x4xf32> into tensor<?x?xf32>		%rsC = tensor.insert_slice %FC into %sC[0, 0][%sz2, 4][1, 1] : tensor<?x4xf32> into tensor<?x?xf32>
%rC = tensor.insert_slice %rsC into %C[0, 0][%idx, %idx][1, 1] : tensor<?x?xf32> into tensor<?x?xf32>		%rC = tensor.insert_slice %rsC into %C[0, 0][%idx, %idx][1, 1] : tensor<?x?xf32> into tensor<?x?xf32>

// CHECK: return		// CHECK: return
// CHECK-SAME: __equivalent_func_args__ = [-1, 1, 2]		// CHECK-SAME: __equivalent_func_args__ = [-1, 1, 2]
return %rA, %rB, %rC: tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>		return %rA, %rB, %rC: tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>
}		}

Show All 10 Lines	func.func @dependence_through_call(%I : tensor<64xf32> {bufferization.writable = true}) {
%f1 = arith.constant 1.000000e+00 : f32		%f1 = arith.constant 1.000000e+00 : f32
%f2 = arith.constant 2.000000e+00 : f32		%f2 = arith.constant 2.000000e+00 : f32

// 2. %B already bufferizes inplace, %A would alias and have a different		// 2. %B already bufferizes inplace, %A would alias and have a different
// value. The calls to `foo` are determined to read conservatively, so %A		// value. The calls to `foo` are determined to read conservatively, so %A
// cannot bufferize inplace.		// cannot bufferize inplace.
// CHECK: fill		// CHECK: fill
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "false"]}
%A = linalg.fill ins(%f1 : f32) outs(%I : tensor<64xf32>) -> tensor<64xf32>		%A = linalg.fill ins(%f1 : f32) outs(%I : tensor<64xf32>)

// 1. Bufferizes inplace: no alias to %A is yet possible.		// 1. Bufferizes inplace: no alias to %A is yet possible.
// CHECK: fill		// CHECK: fill
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]}
%B = linalg.fill ins(%f2 : f32) outs(%I : tensor<64xf32>) -> tensor<64xf32>		%B = linalg.fill ins(%f2 : f32) outs(%I : tensor<64xf32>)

call @foo(%A) : (tensor<64xf32>) -> ()		call @foo(%A) : (tensor<64xf32>) -> ()
call @foo(%B) : (tensor<64xf32>) -> ()		call @foo(%B) : (tensor<64xf32>) -> ()

return		return
}		}

// -----		// -----
Show All 14 Lines	func.func @read_dependence_through_scf_and_call(
%f1 = arith.constant 1.000000e+00 : f32		%f1 = arith.constant 1.000000e+00 : f32
%f2 = arith.constant 2.000000e+00 : f32		%f2 = arith.constant 2.000000e+00 : f32

// 5. %B bufferizes inplace, %A would alias and have a different value.		// 5. %B bufferizes inplace, %A would alias and have a different value.
// The calls to `foo` are determined to read conservatively, so %A cannot		// The calls to `foo` are determined to read conservatively, so %A cannot
// bufferize inplace.		// bufferize inplace.
// CHECK: fill		// CHECK: fill
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "false"]}
%A = linalg.fill ins(%f1 : f32) outs(%I : tensor<64xf32>) -> tensor<64xf32>		%A = linalg.fill ins(%f1 : f32) outs(%I : tensor<64xf32>)

// 4. Bufferizes inplace: no alias to %A is yet possible.		// 4. Bufferizes inplace: no alias to %A is yet possible.
// CHECK: fill		// CHECK: fill
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]}
%B = linalg.fill ins(%f2 : f32) outs(%I : tensor<64xf32>) -> tensor<64xf32>		%B = linalg.fill ins(%f2 : f32) outs(%I : tensor<64xf32>)

// 3. Does not read or write, bufferizes inplace.		// 3. Does not read or write, bufferizes inplace.
// CHECK: scf.for		// CHECK: scf.for
// CHECK-NEXT: scf.yield		// CHECK-NEXT: scf.yield
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true"]}
// CHECK: } {__inplace_operands_attr__ = ["none", "none", "none", "true", "true"]}		// CHECK: } {__inplace_operands_attr__ = ["none", "none", "none", "true", "true"]}
%r:2 = scf.for %i = %c0 to %c10 step %c1 iter_args(%0 = %A, %1 = %B)		%r:2 = scf.for %i = %c0 to %c10 step %c1 iter_args(%0 = %A, %1 = %B)
-> (tensor<64xf32>, tensor<64xf32>)		-> (tensor<64xf32>, tensor<64xf32>)
{		{
scf.yield %0, %1 : tensor<64xf32>, tensor<64xf32>		scf.yield %0, %1 : tensor<64xf32>, tensor<64xf32>
}		}
call @foo(%r#0) : (tensor<64xf32>) -> ()		call @foo(%r#0) : (tensor<64xf32>) -> ()
call @foo(%r#1) : (tensor<64xf32>) -> ()		call @foo(%r#1) : (tensor<64xf32>) -> ()

// 2. %B2 already bufferizes inplace, %A2 would alias and have a different		// 2. %B2 already bufferizes inplace, %A2 would alias and have a different
// value. The calls to `foo` are determined to read conservatively, so %A2		// value. The calls to `foo` are determined to read conservatively, so %A2
// cannot bufferize inplace.		// cannot bufferize inplace.
// CHECK: fill		// CHECK: fill
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "false"]}
%A2 = linalg.fill ins(%f1 : f32) outs(%I2 : tensor<64xf32>) -> tensor<64xf32>		%A2 = linalg.fill ins(%f1 : f32) outs(%I2 : tensor<64xf32>)

// 1. Bufferizes inplace: no alias to %A2 is yet possible.		// 1. Bufferizes inplace: no alias to %A2 is yet possible.
// CHECK: fill		// CHECK: fill
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]}
%B2 = linalg.fill ins(%f2 : f32) outs(%I2 : tensor<64xf32>) -> tensor<64xf32>		%B2 = linalg.fill ins(%f2 : f32) outs(%I2 : tensor<64xf32>)

call @bar(%A2) : (tensor<64xf32>) -> ()		call @bar(%A2) : (tensor<64xf32>) -> ()
call @bar(%B2) : (tensor<64xf32>) -> ()		call @bar(%B2) : (tensor<64xf32>) -> ()
return		return
}		}

// -----		// -----

Show All 28 Lines	func.func @matmul_on_tensors(
%cst_1 = arith.constant 1.000000e+00 : f32		%cst_1 = arith.constant 1.000000e+00 : f32

%7 = bufferization.alloc_tensor() : tensor<256x256xf32>		%7 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: linalg.fill		// CHECK: linalg.fill
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "false"]}
// CHECK: linalg.fill		// CHECK: linalg.fill
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]}
%8 = linalg.fill ins(%cst_0 : f32) outs(%7 : tensor<256x256xf32>) -> tensor<256x256xf32>		%8 = linalg.fill ins(%cst_0 : f32) outs(%7 : tensor<256x256xf32>)
%11 = linalg.fill ins(%cst_1 : f32) outs(%7 : tensor<256x256xf32>) -> tensor<256x256xf32>		%11 = linalg.fill ins(%cst_1 : f32) outs(%7 : tensor<256x256xf32>)

// CHECK: tensor.extract_slice		// CHECK: tensor.extract_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true"]}
// CHECK: tensor.extract_slice		// CHECK: tensor.extract_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true"]}
// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "true"]}
%sA = tensor.extract_slice %8[0, 0][256, 16][1, 1]: tensor<256x256xf32> to tensor<256x16xf32>		%sA = tensor.extract_slice %8[0, 0][256, 16][1, 1]: tensor<256x256xf32> to tensor<256x16xf32>
%sB = tensor.extract_slice %11[0, 0][16, 256][1, 1]: tensor<256x256xf32> to tensor<16x256xf32>		%sB = tensor.extract_slice %11[0, 0][16, 256][1, 1]: tensor<256x256xf32> to tensor<16x256xf32>
%r = linalg.matmul		%r = linalg.matmul
ins(%sA, %sB : tensor<256x16xf32>, tensor<16x256xf32>)		ins(%sA, %sB : tensor<256x16xf32>, tensor<16x256xf32>)
outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		outs(%arg2 : tensor<256x256xf32>)

// CHECK: return		// CHECK: return
// CHECK-SAME: __equivalent_func_args__ = [2]		// CHECK-SAME: __equivalent_func_args__ = [2]
return %r : tensor<256x256xf32>		return %r : tensor<256x256xf32>
}		}

// -----		// -----

func.func @matmul_on_tensors(		func.func @matmul_on_tensors(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst_0 = arith.constant 0.000000e+00 : f32		%cst_0 = arith.constant 0.000000e+00 : f32
%cst_1 = arith.constant 1.000000e+00 : f32		%cst_1 = arith.constant 1.000000e+00 : f32

%7 = bufferization.alloc_tensor() : tensor<256x256xf32>		%7 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: linalg.fill		// CHECK: linalg.fill
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "false"]}
// CHECK: vector.transfer_write		// CHECK: vector.transfer_write
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true", "none", "none"]		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true", "none", "none"]
%8 = linalg.fill ins(%cst_0 : f32) outs(%7 : tensor<256x256xf32>) -> tensor<256x256xf32>		%8 = linalg.fill ins(%cst_0 : f32) outs(%7 : tensor<256x256xf32>)
%9 = vector.transfer_read %arg0[%c0, %c0], %cst_0 {in_bounds = [false, true]} : tensor<518x518xf32>, vector<256x256xf32>		%9 = vector.transfer_read %arg0[%c0, %c0], %cst_0 {in_bounds = [false, true]} : tensor<518x518xf32>, vector<256x256xf32>
%10 = vector.transfer_write %9, %8[%c0, %c0] {in_bounds = [true, true]} : vector<256x256xf32>, tensor<256x256xf32>		%10 = vector.transfer_write %9, %8[%c0, %c0] {in_bounds = [true, true]} : vector<256x256xf32>, tensor<256x256xf32>

// CHECK: linalg.fill		// CHECK: linalg.fill
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]}
// CHECK: vector.transfer_write		// CHECK: vector.transfer_write
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true", "none", "none"]		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true", "none", "none"]
%11 = linalg.fill ins(%cst_1 : f32) outs(%7 : tensor<256x256xf32>) -> tensor<256x256xf32>		%11 = linalg.fill ins(%cst_1 : f32) outs(%7 : tensor<256x256xf32>)
%12 = vector.transfer_read %arg1[%c0, %c0], %cst_0 {in_bounds = [false, true]} : tensor<518x518xf32>, vector<256x256xf32>		%12 = vector.transfer_read %arg1[%c0, %c0], %cst_0 {in_bounds = [false, true]} : tensor<518x518xf32>, vector<256x256xf32>
%13 = vector.transfer_write %12, %11[%c0, %c0] {in_bounds = [true, true]} : vector<256x256xf32>, tensor<256x256xf32>		%13 = vector.transfer_write %12, %11[%c0, %c0] {in_bounds = [true, true]} : vector<256x256xf32>, tensor<256x256xf32>

// CHECK: tensor.extract_slice		// CHECK: tensor.extract_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true"]}
// CHECK: tensor.extract_slice		// CHECK: tensor.extract_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true"]}
// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "true"]}
%sA = tensor.extract_slice %10[0, 0][256, 16][1, 1]: tensor<256x256xf32> to tensor<256x16xf32>		%sA = tensor.extract_slice %10[0, 0][256, 16][1, 1]: tensor<256x256xf32> to tensor<256x16xf32>
%sB = tensor.extract_slice %13[0, 0][16, 256][1, 1]: tensor<256x256xf32> to tensor<16x256xf32>		%sB = tensor.extract_slice %13[0, 0][16, 256][1, 1]: tensor<256x256xf32> to tensor<16x256xf32>
%r = linalg.matmul		%r = linalg.matmul
ins(%sA, %sB : tensor<256x16xf32>, tensor<16x256xf32>)		ins(%sA, %sB : tensor<256x16xf32>, tensor<16x256xf32>)
outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		outs(%arg2 : tensor<256x256xf32>)

// CHECK: return		// CHECK: return
// CHECK-SAME: __equivalent_func_args__ = [2]		// CHECK-SAME: __equivalent_func_args__ = [2]
return %r : tensor<256x256xf32>		return %r : tensor<256x256xf32>
}		}

// -----		// -----

Show All 14 Lines
// CHECK-SAME: bufferization.access = "write"		// CHECK-SAME: bufferization.access = "write"
-> tensor<62x90xf32> attributes {passthrough = [["target-cpu", "skylake-avx512"], ["prefer-vector-width", "512"]]}		-> tensor<62x90xf32> attributes {passthrough = [["target-cpu", "skylake-avx512"], ["prefer-vector-width", "512"]]}
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32

// CHECK: linalg.fill		// CHECK: linalg.fill
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]
%0 = linalg.fill ins(%cst : f32) outs(%arg2 : tensor<62x90xf32>) -> tensor<62x90xf32>		%0 = linalg.fill ins(%cst : f32) outs(%arg2 : tensor<62x90xf32>)

// CHECK: tensor.extract_slice		// CHECK: tensor.extract_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true"]		// CHECK-SAME: {__inplace_operands_attr__ = ["true"]
%2 = tensor.extract_slice %0[0, 0] [32, 90] [1, 1] : tensor<62x90xf32> to tensor<32x90xf32>		%2 = tensor.extract_slice %0[0, 0] [32, 90] [1, 1] : tensor<62x90xf32> to tensor<32x90xf32>
// CHECK: vector.transfer_write		// CHECK: vector.transfer_write
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true", "none", "none"]		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true", "none", "none"]
%7 = vector.transfer_write %v1, %2[%c0, %c0] {in_bounds = [true, true]} : vector<32x90xf32>, tensor<32x90xf32>		%7 = vector.transfer_write %v1, %2[%c0, %c0] {in_bounds = [true, true]} : vector<32x90xf32>, tensor<32x90xf32>
// CHECK: tensor.insert_slice		// CHECK: tensor.insert_slice
▲ Show 20 Lines • Show All 555 Lines • Show Last 20 Lines

mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize-invalid.mlir

Show First 20 Lines • Show All 216 Lines • ▼ Show 20 Lines	func.func @unknown_op(%A : tensor<4xf32>) -> tensor<4xf32>
return %r: tensor<4xf32>		return %r: tensor<4xf32>
}		}

// -----		// -----

func.func @mini_test_case1() -> tensor<10x20xf32> {		func.func @mini_test_case1() -> tensor<10x20xf32> {
%f0 = arith.constant 0.0 : f32		%f0 = arith.constant 0.0 : f32
%t = bufferization.alloc_tensor() : tensor<10x20xf32>		%t = bufferization.alloc_tensor() : tensor<10x20xf32>
%r = linalg.fill ins(%f0 : f32) outs(%t : tensor<10x20xf32>) -> tensor<10x20xf32>		%r = linalg.fill ins(%f0 : f32) outs(%t : tensor<10x20xf32>)
// expected-error @+1 {{operand #0 may return/yield a new buffer allocation}}		// expected-error @+1 {{operand #0 may return/yield a new buffer allocation}}
return %r : tensor<10x20xf32>		return %r : tensor<10x20xf32>
}		}

// -----		// -----

func.func @main() -> tensor<4xi32> {		func.func @main() -> tensor<4xi32> {
%r = scf.execute_region -> tensor<4xi32> {		%r = scf.execute_region -> tensor<4xi32> {
▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize.mlir

Show First 20 Lines • Show All 238 Lines • ▼ Show 20 Lines

// This function does not read, just write. We need an alloc, but no copy.		// This function does not read, just write. We need an alloc, but no copy.

// CHECK-LABEL: func @does_not_read(		// CHECK-LABEL: func @does_not_read(
// CHECK-NOT: alloc		// CHECK-NOT: alloc
// CHECK-NOT: copy		// CHECK-NOT: copy
func.func @does_not_read(%t: tensor<?xf32>) -> tensor<?xf32> {		func.func @does_not_read(%t: tensor<?xf32>) -> tensor<?xf32> {
%f0 = arith.constant 0.0 : f32		%f0 = arith.constant 0.0 : f32
%r = linalg.fill ins(%f0 : f32) outs(%t : tensor<?xf32>) -> tensor<?xf32>		%r = linalg.fill ins(%f0 : f32) outs(%t : tensor<?xf32>)
return %r : tensor<?xf32>		return %r : tensor<?xf32>
}		}

// CHECK-LABEL: func @main(		// CHECK-LABEL: func @main(
// CHECK-SAME: %[[t:.*]]: memref<?xf32		// CHECK-SAME: %[[t:.*]]: memref<?xf32
// CHECK: %[[alloc:.*]] = memref.alloc		// CHECK: %[[alloc:.*]] = memref.alloc
// CHECK-NOT: copy		// CHECK-NOT: copy
// CHECK: %[[casted:.*]] = memref.cast %[[alloc]]		// CHECK: %[[casted:.*]] = memref.cast %[[alloc]]
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines
// CHECK-SAME: %[[A:[a-zA-Z0-9]*]]: memref<64xf32, strided<[?], offset: ?>>		// CHECK-SAME: %[[A:[a-zA-Z0-9]*]]: memref<64xf32, strided<[?], offset: ?>>
// CHECK-SAME: %[[B:[a-zA-Z0-9]*]]: memref<64xf32, strided<[?], offset: ?>>		// CHECK-SAME: %[[B:[a-zA-Z0-9]*]]: memref<64xf32, strided<[?], offset: ?>>
// CHECK-SAME: %[[C:[a-zA-Z0-9]*]]: memref<f32, strided<[], offset: ?>>		// CHECK-SAME: %[[C:[a-zA-Z0-9]*]]: memref<f32, strided<[], offset: ?>>
func.func @init_and_dot(%a: tensor<64xf32>, %b: tensor<64xf32>, %c: tensor<f32>) -> tensor<f32> {		func.func @init_and_dot(%a: tensor<64xf32>, %b: tensor<64xf32>, %c: tensor<f32>) -> tensor<f32> {
// CHECK-NEXT: %[[C0:.]] = arith.constant 0{{.}} : f32		// CHECK-NEXT: %[[C0:.]] = arith.constant 0{{.}} : f32
%v0 = arith.constant 0.0 : f32		%v0 = arith.constant 0.0 : f32

// CHECK-NEXT: linalg.fill ins(%[[C0]] : f32) outs(%[[C]] : memref<f32, strided<[], offset: ?>>)		// CHECK-NEXT: linalg.fill ins(%[[C0]] : f32) outs(%[[C]] : memref<f32, strided<[], offset: ?>>)
%d = linalg.fill ins(%v0 : f32) outs(%c : tensor<f32>) -> tensor<f32>		%d = linalg.fill ins(%v0 : f32) outs(%c : tensor<f32>)

// CHECK-NEXT: linalg.dot ins(%[[A]], %[[B]] : memref<64xf32, strided<[?], offset: ?>>, memref<64xf32, strided<[?], offset: ?>>) outs(%[[C]] : memref<f32, strided<[], offset: ?>>)		// CHECK-NEXT: linalg.dot ins(%[[A]], %[[B]] : memref<64xf32, strided<[?], offset: ?>>, memref<64xf32, strided<[?], offset: ?>>) outs(%[[C]] : memref<f32, strided<[], offset: ?>>)
%e = linalg.dot ins(%a, %b : tensor<64xf32>,tensor<64xf32>)		%e = linalg.dot ins(%a, %b : tensor<64xf32>,tensor<64xf32>)
outs(%d: tensor<f32>) -> tensor<f32>		outs(%d: tensor<f32>)

// CHECK-NEXT: return		// CHECK-NEXT: return
return %e : tensor<f32>		return %e : tensor<f32>
}		}

// CHECK: func @main()		// CHECK: func @main()
func.func @main() {		func.func @main() {
// CHECK-DAG: %[[C0:.]] = arith.constant 0{{.}} : f32		// CHECK-DAG: %[[C0:.]] = arith.constant 0{{.}} : f32
Show All 11 Lines	func.func @main() {
// CHECK-DAG: %[[cC:.*]] = memref.cast %[[C]] : memref<f32> to memref<f32, strided<[], offset: ?>>		// CHECK-DAG: %[[cC:.*]] = memref.cast %[[C]] : memref<f32> to memref<f32, strided<[], offset: ?>>
%A = bufferization.alloc_tensor() : tensor<64xf32>		%A = bufferization.alloc_tensor() : tensor<64xf32>
%B = bufferization.alloc_tensor() : tensor<64xf32>		%B = bufferization.alloc_tensor() : tensor<64xf32>
%C = bufferization.alloc_tensor() : tensor<f32>		%C = bufferization.alloc_tensor() : tensor<f32>

// CHECK-DAG: linalg.fill ins(%[[C1]] : f32) outs(%[[A]] : memref<64xf32>)		// CHECK-DAG: linalg.fill ins(%[[C1]] : f32) outs(%[[A]] : memref<64xf32>)
// CHECK-DAG: linalg.fill ins(%[[C2]] : f32) outs(%[[B]] : memref<64xf32>)		// CHECK-DAG: linalg.fill ins(%[[C2]] : f32) outs(%[[B]] : memref<64xf32>)
// CHECK-DAG: linalg.fill ins(%[[C0]] : f32) outs(%[[C]] : memref<f32>)		// CHECK-DAG: linalg.fill ins(%[[C0]] : f32) outs(%[[C]] : memref<f32>)
%AA = linalg.fill ins(%v1 : f32) outs(%A : tensor<64xf32>) -> tensor<64xf32>		%AA = linalg.fill ins(%v1 : f32) outs(%A : tensor<64xf32>)
%BB = linalg.fill ins(%v2 : f32) outs(%B : tensor<64xf32>) -> tensor<64xf32>		%BB = linalg.fill ins(%v2 : f32) outs(%B : tensor<64xf32>)
%CC = linalg.fill ins(%v0 : f32) outs(%C : tensor<f32>) -> tensor<f32>		%CC = linalg.fill ins(%v0 : f32) outs(%C : tensor<f32>)

// CHECK-NEXT: call @init_and_dot(%[[cA]], %[[cB]], %[[cC]])		// CHECK-NEXT: call @init_and_dot(%[[cA]], %[[cB]], %[[cC]])
%res = call @init_and_dot(%AA, %BB, %CC) :		%res = call @init_and_dot(%AA, %BB, %CC) :
(tensor<64xf32>, tensor<64xf32>, tensor<f32>) -> tensor<f32>		(tensor<64xf32>, tensor<64xf32>, tensor<f32>) -> tensor<f32>

// CHECK-NEXT: %[[dC:.]] = memref.cast %[[cC]] : memref<f32, {{.}}> to memref<*xf32>		// CHECK-NEXT: %[[dC:.]] = memref.cast %[[cC]] : memref<f32, {{.}}> to memref<*xf32>
%res2 = tensor.cast %res: tensor<f32> to tensor<*xf32>		%res2 = tensor.cast %res: tensor<f32> to tensor<*xf32>

▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines

mlir/test/Dialect/Bufferization/Transforms/transform-ops.mlir

	Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines
	}			}

	// CHECK: func.func @matmul(			// CHECK: func.func @matmul(
	// CHECK-SAME: %[[A:.*]]: memref<12x9xf32>,			// CHECK-SAME: %[[A:.*]]: memref<12x9xf32>,
	// CHECK-SAME: %[[B:.*]]: memref<9x6xf32>,			// CHECK-SAME: %[[B:.*]]: memref<9x6xf32>,
	// CHECK-SAME: %[[C:.*]]: memref<12x6xf32>) -> memref<12x6xf32> {			// CHECK-SAME: %[[C:.*]]: memref<12x6xf32>) -> memref<12x6xf32> {
	func.func @matmul(%A: tensor<12x9xf32>, %B: tensor<9x6xf32>, %C: tensor<12x6xf32>) -> tensor<12x6xf32> {			func.func @matmul(%A: tensor<12x9xf32>, %B: tensor<9x6xf32>, %C: tensor<12x6xf32>) -> tensor<12x6xf32> {
	// CHECK: linalg.matmul ins(%[[A]], %[[B]] : memref<12x9xf32>, memref<9x6xf32>) outs(%[[C]] : memref<12x6xf32>)			// CHECK: linalg.matmul ins(%[[A]], %[[B]] : memref<12x9xf32>, memref<9x6xf32>) outs(%[[C]] : memref<12x6xf32>)
	%D = linalg.matmul ins(%A, %B: tensor<12x9xf32>, tensor<9x6xf32>) outs(%C: tensor<12x6xf32>) -> tensor<12x6xf32>			%D = linalg.matmul ins(%A, %B: tensor<12x9xf32>, tensor<9x6xf32>) outs(%C: tensor<12x6xf32>)
	// CHECK: return %[[C]] : memref<12x6xf32>			// CHECK: return %[[C]] : memref<12x6xf32>
	return %D : tensor<12x6xf32>			return %D : tensor<12x6xf32>
	}			}

	// -----			// -----

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb0(%arg1: !pdl.operation):			^bb0(%arg1: !pdl.operation):
	Show All 11 Lines

mlir/test/Dialect/GPU/transform-gpu-failing.mlir

	Show First 20 Lines • Show All 120 Lines • ▼ Show 20 Lines

	// -----			// -----

	func.func @map_nested_foreach_to_threads_not_buffer(%x: tensor<32x32xf32>, %y: tensor<32x32xf32>, %z: tensor<32x32xf32>, %stream : !gpu.async.token) {			func.func @map_nested_foreach_to_threads_not_buffer(%x: tensor<32x32xf32>, %y: tensor<32x32xf32>, %z: tensor<32x32xf32>, %stream : !gpu.async.token) {
	%one = arith.constant 1 : index			%one = arith.constant 1 : index
	%name = gpu.launch async[%stream] blocks(%arg3, %arg4, %arg5) in (%arg9 = %one, %arg10 = %one, %arg11 = %one)			%name = gpu.launch async[%stream] blocks(%arg3, %arg4, %arg5) in (%arg9 = %one, %arg10 = %one, %arg11 = %one)
	threads(%arg6, %arg7, %arg8) in (%arg12 = %one, %arg13 = %one, %arg14 = %one)			threads(%arg6, %arg7, %arg8) in (%arg12 = %one, %arg13 = %one, %arg14 = %one)
	{			{
	%t = linalg.matmul ins(%x, %y: tensor<32x32xf32>, tensor<32x32xf32>) outs(%z : tensor<32x32xf32>) -> tensor<32x32xf32>			%t = linalg.matmul ins(%x, %y: tensor<32x32xf32>, tensor<32x32xf32>) outs(%z : tensor<32x32xf32>)
	gpu.terminator			gpu.terminator
	}			}
	return			return
	}			}

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb1(%arg0: !pdl.operation):			^bb1(%arg0: !pdl.operation):
	%matmul = transform.structured.match ops{["linalg.matmul"]} in %arg0			%matmul = transform.structured.match ops{["linalg.matmul"]} in %arg0
	▲ Show 20 Lines • Show All 166 Lines • Show Last 20 Lines

mlir/test/Dialect/LLVM/transform-e2e.mlir

	// RUN: mlir-opt %s --test-transform-dialect-interpreter -test-transform-dialect-erase-schedule --test-lower-to-llvm --split-input-file \| FileCheck %s			// RUN: mlir-opt %s --test-transform-dialect-interpreter -test-transform-dialect-erase-schedule --test-lower-to-llvm --split-input-file \| FileCheck %s

	// CHECK-LABEL: llvm.func @matmul_tensors			// CHECK-LABEL: llvm.func @matmul_tensors
	func.func @matmul_tensors(			func.func @matmul_tensors(
	%arg0: tensor<2x4xf32>, %arg1: tensor<4x6xf32>, %arg2: tensor<2x6xf32>)			%arg0: tensor<2x4xf32>, %arg1: tensor<4x6xf32>, %arg2: tensor<2x6xf32>)
	-> tensor<2x6xf32> {			-> tensor<2x6xf32> {
	// CHECK-NOT: linalg			// CHECK-NOT: linalg
	// CHECK: llvm.intr.fmuladd{{.*}}			// CHECK: llvm.intr.fmuladd{{.*}}
	%0 = linalg.matmul ins(%arg0, %arg1: tensor<2x4xf32>, tensor<4x6xf32>)			%0 = linalg.matmul ins(%arg0, %arg1: tensor<2x4xf32>, tensor<4x6xf32>)
	outs(%arg2: tensor<2x6xf32>)			outs(%arg2: tensor<2x6xf32>)
	-> tensor<2x6xf32>
	return %0 : tensor<2x6xf32>			return %0 : tensor<2x6xf32>
	}			}

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb1(%module_op: !pdl.operation):			^bb1(%module_op: !pdl.operation):
	%0 = transform.structured.match ops{["linalg.matmul"]} in %module_op			%0 = transform.structured.match ops{["linalg.matmul"]} in %module_op
	%1, %loops:3 = transform.structured.tile %0 [2, 2, 2]			%1, %loops:3 = transform.structured.tile %0 [2, 2, 2]
	%2 = get_closest_isolated_parent %1 : (!pdl.operation) -> !pdl.operation			%2 = get_closest_isolated_parent %1 : (!pdl.operation) -> !pdl.operation
	transform.structured.vectorize %2			transform.structured.vectorize %2
	transform.bufferization.one_shot_bufferize layout{IdentityLayoutMap} %module_op			transform.bufferization.one_shot_bufferize layout{IdentityLayoutMap} %module_op
	{bufferize_function_boundaries = true}			{bufferize_function_boundaries = true}
	%func = transform.structured.match ops{["func.func"]} in %module_op			%func = transform.structured.match ops{["func.func"]} in %module_op
	transform.vector.lower_vectors %func { multireduction_lowering = "innerreduce"}			transform.vector.lower_vectors %func { multireduction_lowering = "innerreduce"}
	}			}

mlir/test/Dialect/Linalg/affine.mlir

	// RUN: mlir-opt %s -convert-linalg-to-affine-loops \| FileCheck %s			// RUN: mlir-opt %s -convert-linalg-to-affine-loops \| FileCheck %s

	// Test that we can lower all the way to LLVM without crashing, don't check results here.			// Test that we can lower all the way to LLVM without crashing, don't check results here.
	// RUN: mlir-opt %s -convert-linalg-to-affine-loops -convert-linalg-to-llvm -o=/dev/null 2>&1			// RUN: mlir-opt %s -convert-linalg-to-affine-loops -convert-linalg-to-llvm -o=/dev/null 2>&1

	func.func @matmul(%arg0: memref<?xi8>, %M: index, %N: index, %K: index) {			func.func @matmul(%arg0: memref<?xi8>, %M: index, %N: index, %K: index) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%c1 = arith.constant 1 : index			%c1 = arith.constant 1 : index
	%A = memref.view %arg0[%c0][%M, %K] : memref<?xi8> to memref<?x?xf32>			%A = memref.view %arg0[%c0][%M, %K] : memref<?xi8> to memref<?x?xf32>
	%B = memref.view %arg0[%c0][%K, %N] : memref<?xi8> to memref<?x?xf32>			%B = memref.view %arg0[%c0][%K, %N] : memref<?xi8> to memref<?x?xf32>
	%C = memref.view %arg0[%c0][%M, %N] : memref<?xi8> to memref<?x?xf32>			%C = memref.view %arg0[%c0][%M, %N] : memref<?xi8> to memref<?x?xf32>
	linalg.matmul ins(%A, %B: memref<?x?xf32>, memref<?x?xf32>)			linalg.matmul ins(%A, %B: memref<?x?xf32>, memref<?x?xf32>)
	outs(%C: memref<?x?xf32>)			outs(%C: memref<?x?xf32>)
	return			return
	}			}

	//----------------------------------------------------------------------------//			//----------------------------------------------------------------------------//
	// Named ops to loops.			// Named ops to loops.
	//----------------------------------------------------------------------------//			//----------------------------------------------------------------------------//
	func.func @named_batch_matmul(%A: memref<?x?x?xf32>, %B: memref<?x?x?xf32>, %C: memref<?x?x?xf32>) {			func.func @named_batch_matmul(%A: memref<?x?x?xf32>, %B: memref<?x?x?xf32>, %C: memref<?x?x?xf32>) {
	linalg.batch_matmul ins(%A, %B: memref<?x?x?xf32>, memref<?x?x?xf32>)			linalg.batch_matmul ins(%A, %B: memref<?x?x?xf32>, memref<?x?x?xf32>)
	outs(%C : memref<?x?x?xf32>)			outs(%C : memref<?x?x?xf32>)
	return			return
	}			}
	// CHECK-LABEL: @named_batch_matmul			// CHECK-LABEL: @named_batch_matmul
	// CHECK-SAME: %[[mA:[a-zA-Z0-9]+]]: memref<?x?x?xf32>			// CHECK-SAME: %[[mA:[a-zA-Z0-9]+]]: memref<?x?x?xf32>
	// CHECK-SAME: %[[mB:[a-zA-Z0-9]+]]: memref<?x?x?xf32>			// CHECK-SAME: %[[mB:[a-zA-Z0-9]+]]: memref<?x?x?xf32>
	// CHECK-SAME: %[[mC:[a-zA-Z0-9]+]]: memref<?x?x?xf32>			// CHECK-SAME: %[[mC:[a-zA-Z0-9]+]]: memref<?x?x?xf32>
	// CHECK: %[[B:.*]] = memref.dim %[[mA]], %c0 : memref<?x?x?xf32>			// CHECK: %[[B:.*]] = memref.dim %[[mA]], %c0 : memref<?x?x?xf32>
	// CHECK: %[[M:.*]] = memref.dim %[[mA]], %c1 : memref<?x?x?xf32>			// CHECK: %[[M:.*]] = memref.dim %[[mA]], %c1 : memref<?x?x?xf32>
	Show All 12 Lines

mlir/test/Dialect/Linalg/bubble-up-extract-slice-op.mlir

	Show First 20 Lines • Show All 106 Lines • ▼ Show 20 Lines
	// CHECK: return %[[GENERIC]] : tensor<4x2xf32>			// CHECK: return %[[GENERIC]] : tensor<4x2xf32>

	//-----			//-----

	func.func @matmul_slice() -> tensor<2x2xf32> {			func.func @matmul_slice() -> tensor<2x2xf32> {
	%lhs = arith.constant dense<1.0> : tensor<4x4xf32>			%lhs = arith.constant dense<1.0> : tensor<4x4xf32>
	%rhs = arith.constant dense<1.0> : tensor<4x4xf32>			%rhs = arith.constant dense<1.0> : tensor<4x4xf32>
	%dst = arith.constant dense<[[0.0, 1.0, 2.0, 3.0], [4.0, 5.0, 6.0, 7.0], [8.0, 9.0, 10.0, 11.0], [12.0, 13.0, 14.0, 15.0]]> : tensor<4x4xf32>			%dst = arith.constant dense<[[0.0, 1.0, 2.0, 3.0], [4.0, 5.0, 6.0, 7.0], [8.0, 9.0, 10.0, 11.0], [12.0, 13.0, 14.0, 15.0]]> : tensor<4x4xf32>
	%0 = linalg.matmul ins(%lhs, %rhs : tensor<4x4xf32>, tensor<4x4xf32>) outs(%dst : tensor<4x4xf32>) -> tensor<4x4xf32>			%0 = linalg.matmul ins(%lhs, %rhs : tensor<4x4xf32>, tensor<4x4xf32>) outs(%dst : tensor<4x4xf32>)
	%1 = tensor.extract_slice %0[1,1][2,2][1,1] : tensor<4x4xf32> to tensor<2x2xf32>			%1 = tensor.extract_slice %0[1,1][2,2][1,1] : tensor<4x4xf32> to tensor<2x2xf32>
	return %1 : tensor<2x2xf32>			return %1 : tensor<2x2xf32>
	}			}

	// CHECK: func @matmul_slice			// CHECK: func @matmul_slice
	// CHECK: %[[SLICE0:.+]] = arith.constant dense<1.000000e+00> : tensor<2x4xf32>			// CHECK: %[[SLICE0:.+]] = arith.constant dense<1.000000e+00> : tensor<2x4xf32>
	// CHECK: %[[SLICE1:.+]] = arith.constant dense<1.000000e+00> : tensor<4x2xf32>			// CHECK: %[[SLICE1:.+]] = arith.constant dense<1.000000e+00> : tensor<4x2xf32>
	// CHECK: %[[SLICE3:.+]] = tensor.extract_slice %[[CST:.+]][1, 1] [2, 2] [1, 1] : tensor<4x4xf32> to tensor<2x2xf32>			// CHECK: %[[SLICE3:.+]] = tensor.extract_slice %[[CST:.+]][1, 1] [2, 2] [1, 1] : tensor<4x4xf32> to tensor<2x2xf32>
	// CHECK: %[[MATMUL:.+]] = linalg.matmul ins(%[[SLICE0]], %[[SLICE1]] : tensor<2x4xf32>, tensor<4x2xf32>) outs(%[[SLICE3]] : tensor<2x2xf32>) -> tensor<2x2xf32>			// CHECK: %[[MATMUL:.+]] = linalg.matmul ins(%[[SLICE0]], %[[SLICE1]] : tensor<2x4xf32>, tensor<4x2xf32>) outs(%[[SLICE3]] : tensor<2x2xf32>)
	// CHECK: return %[[MATMUL]] : tensor<2x2xf32>			// CHECK: return %[[MATMUL]] : tensor<2x2xf32>

	//-----			//-----

	func.func @conv_slice(%input: tensor<1x225x225x3xf32>, %filter: tensor<3x3x3x32xf32>) -> tensor<1x32x32x16xf32> {			func.func @conv_slice(%input: tensor<1x225x225x3xf32>, %filter: tensor<3x3x3x32xf32>) -> tensor<1x32x32x16xf32> {
	%c112 = arith.constant 112 : index			%c112 = arith.constant 112 : index
	%c32 = arith.constant 32 : index			%c32 = arith.constant 32 : index
	%c16 = arith.constant 16 : index			%c16 = arith.constant 16 : index
	%c8 = arith.constant 8 : index			%c8 = arith.constant 8 : index
	%c4 = arith.constant 4 : index			%c4 = arith.constant 4 : index
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cst = arith.constant 0.0 : f32			%cst = arith.constant 0.0 : f32

	%init = tensor.empty() : tensor<1x112x112x32xf32>			%init = tensor.empty() : tensor<1x112x112x32xf32>
	%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x112x112x32xf32>) -> tensor<1x112x112x32xf32>			%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x112x112x32xf32>)

	%conv = linalg.conv_2d_nhwc_hwcf			%conv = linalg.conv_2d_nhwc_hwcf
	{dilations = dense<1> : tensor<2xi64>, strides = dense<2> : tensor<2xi64>}			{dilations = dense<1> : tensor<2xi64>, strides = dense<2> : tensor<2xi64>}
	ins(%input, %filter : tensor<1x225x225x3xf32>, tensor<3x3x3x32xf32>)			ins(%input, %filter : tensor<1x225x225x3xf32>, tensor<3x3x3x32xf32>)
	outs(%fill : tensor<1x112x112x32xf32>) -> tensor<1x112x112x32xf32>			outs(%fill : tensor<1x112x112x32xf32>)

	%slice = tensor.extract_slice %conv [0, 64, 64, 16] [1, 32, 32, 16] [1, 1, 1, 1] : tensor<1x112x112x32xf32> to tensor<1x32x32x16xf32>			%slice = tensor.extract_slice %conv [0, 64, 64, 16] [1, 32, 32, 16] [1, 1, 1, 1] : tensor<1x112x112x32xf32> to tensor<1x32x32x16xf32>

	return %slice : tensor<1x32x32x16xf32>			return %slice : tensor<1x32x32x16xf32>
	}			}

	// CHECK: func @conv_slice			// CHECK: func @conv_slice
	// CHECK: %[[INIT:.+]] = tensor.empty() : tensor<1x112x112x32xf32>			// CHECK: %[[INIT:.+]] = tensor.empty() : tensor<1x112x112x32xf32>
	// CHECK: %[[SLICE0:.+]] = tensor.extract_slice %arg0[0, 128, 128, 0] [1, 65, 65, 3] [1, 1, 1, 1] : tensor<1x225x225x3xf32> to tensor<1x65x65x3xf32>			// CHECK: %[[SLICE0:.+]] = tensor.extract_slice %arg0[0, 128, 128, 0] [1, 65, 65, 3] [1, 1, 1, 1] : tensor<1x225x225x3xf32> to tensor<1x65x65x3xf32>
	// CHECK: %[[SLICE1:.+]] = tensor.extract_slice %arg1[0, 0, 0, 16] [3, 3, 3, 16] [1, 1, 1, 1] : tensor<3x3x3x32xf32> to tensor<3x3x3x16xf32>			// CHECK: %[[SLICE1:.+]] = tensor.extract_slice %arg1[0, 0, 0, 16] [3, 3, 3, 16] [1, 1, 1, 1] : tensor<3x3x3x32xf32> to tensor<3x3x3x16xf32>
	// CHECK: %[[SLICE2:.+]] = tensor.extract_slice %[[INIT]][0, 64, 64, 16] [1, 32, 32, 16] [1, 1, 1, 1] : tensor<1x112x112x32xf32> to tensor<1x32x32x16xf32>			// CHECK: %[[SLICE2:.+]] = tensor.extract_slice %[[INIT]][0, 64, 64, 16] [1, 32, 32, 16] [1, 1, 1, 1] : tensor<1x112x112x32xf32> to tensor<1x32x32x16xf32>
	// CHECK: %[[FILL:.+]] = linalg.fill ins(%[[CST:.+]] : f32) outs(%[[SLICE2]] : tensor<1x32x32x16xf32>) -> tensor<1x32x32x16xf32>			// CHECK: %[[FILL:.+]] = linalg.fill ins(%[[CST:.+]] : f32) outs(%[[SLICE2]] : tensor<1x32x32x16xf32>)
	// CHECK: %[[CONV:.+]] = linalg.conv_2d_nhwc_hwcf {dilations = dense<1> : tensor<2xi64>, strides = dense<2> : tensor<2xi64>} ins(%[[SLICE0]], %[[SLICE1]] : tensor<1x65x65x3xf32>, tensor<3x3x3x16xf32>) outs(%[[FILL]] : tensor<1x32x32x16xf32>) -> tensor<1x32x32x16xf32>			// CHECK: %[[CONV:.+]] = linalg.conv_2d_nhwc_hwcf {dilations = dense<1> : tensor<2xi64>, strides = dense<2> : tensor<2xi64>} ins(%[[SLICE0]], %[[SLICE1]] : tensor<1x65x65x3xf32>, tensor<3x3x3x16xf32>) outs(%[[FILL]] : tensor<1x32x32x16xf32>)
	// CHECK: return %[[CONV]] : tensor<1x32x32x16xf32>			// CHECK: return %[[CONV]] : tensor<1x32x32x16xf32>

	//-----			//-----

	// The slice is not supposed to be bubbled up when it is rank-reducing.			// The slice is not supposed to be bubbled up when it is rank-reducing.
	func.func @rank_reducing_slice(%width : index) -> tensor<1x1x1x?xf32> {			func.func @rank_reducing_slice(%width : index) -> tensor<1x1x1x?xf32> {
	%cst = arith.constant 1.000000e+00 : f32			%cst = arith.constant 1.000000e+00 : f32
	%init = tensor.empty(%width) : tensor<1x?xf32>			%init = tensor.empty(%width) : tensor<1x?xf32>
	%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x?xf32>) -> tensor<1x?xf32>			%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x?xf32>)
	%slice = tensor.extract_slice %fill[0, 0] [1, %width] [1, 1] : tensor<1x?xf32> to tensor<?xf32>			%slice = tensor.extract_slice %fill[0, 0] [1, %width] [1, 1] : tensor<1x?xf32> to tensor<?xf32>
	%expand = tensor.expand_shape %slice [[0, 1, 2, 3]] : tensor<?xf32> into tensor<1x1x1x?xf32>			%expand = tensor.expand_shape %slice [[0, 1, 2, 3]] : tensor<?xf32> into tensor<1x1x1x?xf32>
	return %expand : tensor<1x1x1x?xf32>			return %expand : tensor<1x1x1x?xf32>
	}			}

	// CHECK: func @rank_reducing_slice			// CHECK: func @rank_reducing_slice
	// CHECK: %[[INIT:.+]] = tensor.empty			// CHECK: %[[INIT:.+]] = tensor.empty
	// CHECK: %[[FILL:.+]] = linalg.fill ins			// CHECK: %[[FILL:.+]] = linalg.fill ins
	// CHECK: %[[SLICE:.+]] = tensor.extract_slice %[[FILL]]			// CHECK: %[[SLICE:.+]] = tensor.extract_slice %[[FILL]]
	// CHECK: %[[EXPAND:.+]] = tensor.expand_shape %[[SLICE]]			// CHECK: %[[EXPAND:.+]] = tensor.expand_shape %[[SLICE]]
	// CHECK: return %[[EXPAND]]			// CHECK: return %[[EXPAND]]

mlir/test/Dialect/Linalg/bufferize.mlir

	Show First 20 Lines • Show All 164 Lines • ▼ Show 20 Lines
	// CHECK-LABEL: func @bufferize_fill(			// CHECK-LABEL: func @bufferize_fill(
	// CHECK-SAME: %[[IN:.*]]: tensor<?xf32>			// CHECK-SAME: %[[IN:.*]]: tensor<?xf32>
	func.func @bufferize_fill(%arg0: tensor<?xf32>) -> tensor<?xf32> {			func.func @bufferize_fill(%arg0: tensor<?xf32>) -> tensor<?xf32> {
	%c0 = arith.constant 0.0 : f32			%c0 = arith.constant 0.0 : f32
	// CHECK: %[[ALLOC:.*]] = memref.alloc			// CHECK: %[[ALLOC:.*]] = memref.alloc
	// CHECK: linalg.fill ins(%cst : f32) outs(%[[ALLOC]] : memref<?xf32>)			// CHECK: linalg.fill ins(%cst : f32) outs(%[[ALLOC]] : memref<?xf32>)
	// CHECK: %[[TENSOR:.*]] = bufferization.to_tensor %[[ALLOC]] : memref<?xf32>			// CHECK: %[[TENSOR:.*]] = bufferization.to_tensor %[[ALLOC]] : memref<?xf32>
	// CHECK: return %[[TENSOR]]			// CHECK: return %[[TENSOR]]
	%0 = linalg.fill ins(%c0 : f32) outs(%arg0 : tensor<?xf32>) -> tensor<?xf32>			%0 = linalg.fill ins(%c0 : f32) outs(%arg0 : tensor<?xf32>)
	return %0 : tensor<?xf32>			return %0 : tensor<?xf32>
	}			}

	// -----			// -----

	// CHECK-LABEL: func @bufferize_dot			// CHECK-LABEL: func @bufferize_dot
	func.func @bufferize_dot(%in: tensor<4xf32>, %out: tensor<f32>) -> tensor<f32> {			func.func @bufferize_dot(%in: tensor<4xf32>, %out: tensor<f32>) -> tensor<f32> {
	%dot = linalg.dot ins(%in, %in : tensor<4xf32>, tensor<4xf32>)			%dot = linalg.dot ins(%in, %in : tensor<4xf32>, tensor<4xf32>)
	outs(%out : tensor<f32>) -> tensor<f32>			outs(%out : tensor<f32>)
	return %dot : tensor<f32>			return %dot : tensor<f32>
	// CHECK: %[[ALLOC:.*]] = memref.alloc			// CHECK: %[[ALLOC:.*]] = memref.alloc
	// TODO: The copy is not necessary.			// TODO: The copy is not necessary.
	// CHECK: memref.copy {{.*}}, %[[ALLOC]]			// CHECK: memref.copy {{.*}}, %[[ALLOC]]
	// CHECK: linalg.dot ins(%{{.}}, %{{.}} : memref<4xf32>, memref<4xf32>)			// CHECK: linalg.dot ins(%{{.}}, %{{.}} : memref<4xf32>, memref<4xf32>)
	// CHECK-SAME: outs(%[[ALLOC:.*]] : memref<f32>)			// CHECK-SAME: outs(%[[ALLOC:.*]] : memref<f32>)
	// CHECK: %[[OUT_TENSOR:.*]] = bufferization.to_tensor %[[ALLOC]] : memref<f32>			// CHECK: %[[OUT_TENSOR:.*]] = bufferization.to_tensor %[[ALLOC]] : memref<f32>
	// CHECK: return %[[OUT_TENSOR]]			// CHECK: return %[[OUT_TENSOR]]
	Show All 29 Lines

mlir/test/Dialect/Linalg/canonicalize.mlir

	Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	func.func @tensor.cast(%a : tensor<3x4xf32>, %b : tensor<4x?xf32>, %c : tensor<3x?xf32>)			func.func @tensor.cast(%a : tensor<3x4xf32>, %b : tensor<4x?xf32>, %c : tensor<3x?xf32>)
	-> tensor<3x?xf32>			-> tensor<3x?xf32>
	{			{
	%ta = tensor.cast %a : tensor<3x4xf32> to tensor<?x?xf32>			%ta = tensor.cast %a : tensor<3x4xf32> to tensor<?x?xf32>
	%tb = tensor.cast %b : tensor<4x?xf32> to tensor<?x?xf32>			%tb = tensor.cast %b : tensor<4x?xf32> to tensor<?x?xf32>
	%tc = tensor.cast %c : tensor<3x?xf32> to tensor<?x?xf32>			%tc = tensor.cast %c : tensor<3x?xf32> to tensor<?x?xf32>

	// CHECK: linalg.matmul ins({{.*}}tensor<3x4xf32>, tensor<4x?xf32>)			// CHECK: linalg.matmul ins({{.*}}tensor<3x4xf32>, tensor<4x?xf32>)
	// CHECK-SAME: outs({{.*}}tensor<3x?xf32>) -> tensor<3x?xf32>			// CHECK-SAME: outs({{.*}}tensor<3x?xf32>)
	%0 = linalg.matmul ins(%ta, %tb: tensor<?x?xf32>, tensor<?x?xf32>)			%0 = linalg.matmul ins(%ta, %tb: tensor<?x?xf32>, tensor<?x?xf32>)
	outs(%tc: tensor<?x?xf32>) -> tensor<?x?xf32>			outs(%tc: tensor<?x?xf32>)

	%1 = tensor.cast %0 : tensor<?x?xf32> to tensor<3x?xf32>			%1 = tensor.cast %0 : tensor<?x?xf32> to tensor<3x?xf32>

	return %1: tensor<3x?xf32>			return %1: tensor<3x?xf32>
	}			}

	// -----			// -----

	// CHECK-LABEL: func @tensor.cast.unranked(			// CHECK-LABEL: func @tensor.cast.unranked(
	func.func @tensor.cast.unranked(%a : tensor<xf32>, %b : tensor<xf32>, %c : tensor<*xf32>)			func.func @tensor.cast.unranked(%a : tensor<xf32>, %b : tensor<xf32>, %c : tensor<*xf32>)
	-> tensor<*xf32>			-> tensor<*xf32>
	{			{
	// CHECK: tensor.cast			// CHECK: tensor.cast
	// CHECK: tensor.cast			// CHECK: tensor.cast
	// CHECK: tensor.cast			// CHECK: tensor.cast
	%ta = tensor.cast %a : tensor<*xf32> to tensor<?x?xf32>			%ta = tensor.cast %a : tensor<*xf32> to tensor<?x?xf32>
	%tb = tensor.cast %b : tensor<*xf32> to tensor<?x?xf32>			%tb = tensor.cast %b : tensor<*xf32> to tensor<?x?xf32>
	%tc = tensor.cast %c : tensor<*xf32> to tensor<?x?xf32>			%tc = tensor.cast %c : tensor<*xf32> to tensor<?x?xf32>

	// CHECK: linalg.matmul ins({{.*}}tensor<?x?xf32>, tensor<?x?xf32>)			// CHECK: linalg.matmul ins({{.*}}tensor<?x?xf32>, tensor<?x?xf32>)
	// CHECK-SAME: outs({{.*}}tensor<?x?xf32>) -> tensor<?x?xf32>			// CHECK-SAME: outs({{.*}}tensor<?x?xf32>)
	%0 = linalg.matmul ins(%ta, %tb: tensor<?x?xf32>, tensor<?x?xf32>)			%0 = linalg.matmul ins(%ta, %tb: tensor<?x?xf32>, tensor<?x?xf32>)
	outs(%tc: tensor<?x?xf32>) -> tensor<?x?xf32>			outs(%tc: tensor<?x?xf32>)

	// CHECK: tensor.cast			// CHECK: tensor.cast
	%1 = tensor.cast %0 : tensor<?x?xf32> to tensor<*xf32>			%1 = tensor.cast %0 : tensor<?x?xf32> to tensor<*xf32>

	return %1: tensor<*xf32>			return %1: tensor<*xf32>
	}			}

	// -----			// -----

	// CHECK-LABEL: func @linalg_effects(			// CHECK-LABEL: func @linalg_effects(
	// CHECK-SAME: %[[A:[a-z0-9]*]]: tensor<?x?xf32>			// CHECK-SAME: %[[A:[a-z0-9]*]]: tensor<?x?xf32>
	// CHECK-SAME: %[[B:[a-z0-9]*]]: memref<?x?xf32>			// CHECK-SAME: %[[B:[a-z0-9]*]]: memref<?x?xf32>
	// CHECK-SAME: %[[C:[a-z0-9]*]]: tensor<?x?xf32>			// CHECK-SAME: %[[C:[a-z0-9]*]]: tensor<?x?xf32>
	func.func @linalg_effects(%a : tensor<?x?xf32>, %b : memref<?x?xf32>, %c : tensor<?x?xf32>) {			func.func @linalg_effects(%a : tensor<?x?xf32>, %b : memref<?x?xf32>, %c : tensor<?x?xf32>) {
	// CHECK-NOT: %{{.*}} = linalg.matmul			// CHECK-NOT: %{{.*}} = linalg.matmul
	%t = linalg.matmul ins(%a, %b : tensor<?x?xf32>, memref<?x?xf32>)			%t = linalg.matmul ins(%a, %b : tensor<?x?xf32>, memref<?x?xf32>)
	outs(%c : tensor<?x?xf32>) -> tensor<?x?xf32>			outs(%c : tensor<?x?xf32>)

	// CHECK: linalg.matmul			// CHECK: linalg.matmul
	linalg.matmul ins(%a, %c : tensor<?x?xf32>, tensor<?x?xf32>)			linalg.matmul ins(%a, %c : tensor<?x?xf32>, tensor<?x?xf32>)
	outs(%b : memref<?x?xf32>)			outs(%b : memref<?x?xf32>)
	return			return
	}			}

	// -----			// -----
	▲ Show 20 Lines • Show All 132 Lines • ▼ Show 20 Lines
	// CHECK-NOT: linalg.generic			// CHECK-NOT: linalg.generic
	// CHECK-NOT: tensor.pad			// CHECK-NOT: tensor.pad
	// CHECK: return			// CHECK: return
	func.func @dead_linalg_tensor(%arg0 : tensor<7x7xi32>, %arg1 : tensor<7x7xf32>,			func.func @dead_linalg_tensor(%arg0 : tensor<7x7xi32>, %arg1 : tensor<7x7xf32>,
	%arg2: tensor<?x?xf32>, %high : index) {			%arg2: tensor<?x?xf32>, %high : index) {
	%c0_i32 = arith.constant 0 : i32			%c0_i32 = arith.constant 0 : i32
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cst = arith.constant 0.000000e+00 : f32			%cst = arith.constant 0.000000e+00 : f32
	%0 = linalg.fill ins(%c0_i32 : i32) outs(%arg0 : tensor<7x7xi32>) -> tensor<7x7xi32>			%0 = linalg.fill ins(%c0_i32 : i32) outs(%arg0 : tensor<7x7xi32>)
	%1 = linalg.matmul ins(%arg1, %arg1: tensor<7x7xf32>, tensor<7x7xf32>)			%1 = linalg.matmul ins(%arg1, %arg1: tensor<7x7xf32>, tensor<7x7xf32>)
	outs(%arg1: tensor<7x7xf32>) -> tensor<7x7xf32>			outs(%arg1: tensor<7x7xf32>)
	%2 = linalg.generic #trait outs(%arg0 : tensor<7x7xi32>) {			%2 = linalg.generic #trait outs(%arg0 : tensor<7x7xi32>) {
	^bb(%3: i32) :			^bb(%3: i32) :
	linalg.yield %3 : i32			linalg.yield %3 : i32
	} -> tensor<7x7xi32>			} -> tensor<7x7xi32>
	%3 = tensor.pad %arg2 low[%c0, %c0] high[%high, %high] {			%3 = tensor.pad %arg2 low[%c0, %c0] high[%high, %high] {
	^bb0(%arg9: index, %arg10: index):			^bb0(%arg9: index, %arg10: index):
	tensor.yield %cst : f32			tensor.yield %cst : f32
	} : tensor<?x?xf32> to tensor<2x4xf32>			} : tensor<?x?xf32> to tensor<2x4xf32>
	return			return
	}			}

	// -----			// -----

	func.func @propogate_casts(%arg0 : tensor<?x?xf32>, %arg1 : f32, %arg2 : index,			func.func @propogate_casts(%arg0 : tensor<?x?xf32>, %arg1 : f32, %arg2 : index,
	%arg3 : index) -> tensor<?x?xf32> {			%arg3 : index) -> tensor<?x?xf32> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%c1 = arith.constant 1 : index			%c1 = arith.constant 1 : index
	%c21 = arith.constant 21 : index			%c21 = arith.constant 21 : index
	%c42 = arith.constant 42 : index			%c42 = arith.constant 42 : index
	%0 = tensor.empty(%c21, %c42) : tensor<?x?xf32>			%0 = tensor.empty(%c21, %c42) : tensor<?x?xf32>
	%1 = linalg.fill ins(%arg1 : f32) outs(%0 : tensor<?x?xf32>) -> tensor<?x?xf32>			%1 = linalg.fill ins(%arg1 : f32) outs(%0 : tensor<?x?xf32>)
	%2 = tensor.dim %arg0, %c0 : tensor<?x?xf32>			%2 = tensor.dim %arg0, %c0 : tensor<?x?xf32>
	%3 = tensor.dim %arg0, %c1 : tensor<?x?xf32>			%3 = tensor.dim %arg0, %c1 : tensor<?x?xf32>
	%4 = tensor.insert_slice %arg0 into %1[%arg2, %arg3] [%2, %3] [1, 1] : tensor<?x?xf32> into tensor<?x?xf32>			%4 = tensor.insert_slice %arg0 into %1[%arg2, %arg3] [%2, %3] [1, 1] : tensor<?x?xf32> into tensor<?x?xf32>
	return %4 : tensor<?x?xf32>			return %4 : tensor<?x?xf32>
	}			}
	// CHECK-LABEL: func @propogate_casts			// CHECK-LABEL: func @propogate_casts
	// CHECK: %[[INIT:.+]] = tensor.empty			// CHECK: %[[INIT:.+]] = tensor.empty
	// CHECK: %[[FILL:.+]] = linalg.fill ins(%{{.+}}{{.*}}outs(%[[INIT]]			// CHECK: %[[FILL:.+]] = linalg.fill ins(%{{.+}}{{.*}}outs(%[[INIT]]
	Show All 16 Lines
	// -----			// -----
	// CHECK-LABEL: func @fold_fill_reshape()			// CHECK-LABEL: func @fold_fill_reshape()
	func.func @fold_fill_reshape() -> tensor<6x4xf32> {			func.func @fold_fill_reshape() -> tensor<6x4xf32> {
	%zero = arith.constant 0.0 : f32			%zero = arith.constant 0.0 : f32
	%empty = tensor.empty() : tensor<1x2x3x4xf32>			%empty = tensor.empty() : tensor<1x2x3x4xf32>
	// CHECK: %[[COLLAPSE:.+]] = tensor.collapse_shape			// CHECK: %[[COLLAPSE:.+]] = tensor.collapse_shape
	// CHECK-NEXT: %[[FILL:.+]] = linalg.fill ins(%cst : f32)			// CHECK-NEXT: %[[FILL:.+]] = linalg.fill ins(%cst : f32)
	// CHECK-SAME: outs(%[[COLLAPSE]] : tensor<6x4xf32>)			// CHECK-SAME: outs(%[[COLLAPSE]] : tensor<6x4xf32>)
	%fill = linalg.fill ins(%zero : f32) outs(%empty : tensor<1x2x3x4xf32>) -> tensor<1x2x3x4xf32>			%fill = linalg.fill ins(%zero : f32) outs(%empty : tensor<1x2x3x4xf32>)
	%reshape = tensor.collapse_shape %fill [[0, 1, 2], [3]]			%reshape = tensor.collapse_shape %fill [[0, 1, 2], [3]]
	: tensor<1x2x3x4xf32> into tensor<6x4xf32>			: tensor<1x2x3x4xf32> into tensor<6x4xf32>
	// CHECK: return %[[FILL]] : tensor<6x4xf32>			// CHECK: return %[[FILL]] : tensor<6x4xf32>
	return %reshape : tensor<6x4xf32>			return %reshape : tensor<6x4xf32>
	}			}

	// -----			// -----

	// CHECK: func @fold_fill_reshape_dynamic			// CHECK: func @fold_fill_reshape_dynamic
	// CHECK-SAME: %[[ARG0:.+]]: tensor<?x?x?x?x?xf32>			// CHECK-SAME: %[[ARG0:.+]]: tensor<?x?x?x?x?xf32>
	func.func @fold_fill_reshape_dynamic(%arg0 : tensor<?x?x?x?x?xf32>) -> tensor<?x?xf32> {			func.func @fold_fill_reshape_dynamic(%arg0 : tensor<?x?x?x?x?xf32>) -> tensor<?x?xf32> {
	%zero = arith.constant 0.0 : f32			%zero = arith.constant 0.0 : f32
	// CHECK: %[[RESHAPE:.+]] = tensor.collapse_shape %[[ARG0]]			// CHECK: %[[RESHAPE:.+]] = tensor.collapse_shape %[[ARG0]]
	%0 = linalg.fill ins(%zero : f32) outs(%arg0 : tensor<?x?x?x?x?xf32>) -> tensor<?x?x?x?x?xf32>			%0 = linalg.fill ins(%zero : f32) outs(%arg0 : tensor<?x?x?x?x?xf32>)
	// CHECK: %[[RESULT:.+]] = linalg.fill ins(%{{.+}}{{.*}}outs(%[[RESHAPE]]			// CHECK: %[[RESULT:.+]] = linalg.fill ins(%{{.+}}{{.*}}outs(%[[RESHAPE]]
	%1 = tensor.collapse_shape %0 [[0, 1, 2], [3, 4]]			%1 = tensor.collapse_shape %0 [[0, 1, 2], [3, 4]]
	: tensor<?x?x?x?x?xf32> into tensor<?x?xf32>			: tensor<?x?x?x?x?xf32> into tensor<?x?xf32>
	// CHECK: return %[[RESULT]]			// CHECK: return %[[RESULT]]
	return %1 : tensor<?x?xf32>			return %1 : tensor<?x?xf32>
	}			}

	// -----			// -----
	Show All 17 Lines
	// CHECK-LABEL: func @fold_static_pad_fill			// CHECK-LABEL: func @fold_static_pad_fill
	// CHECK: %[[F0:.+]] = arith.constant 0.000000e+00 : f32			// CHECK: %[[F0:.+]] = arith.constant 0.000000e+00 : f32
	// CHECK: %[[INIT:.+]] = tensor.empty() : tensor<412x276xf32>			// CHECK: %[[INIT:.+]] = tensor.empty() : tensor<412x276xf32>
	// CHECK: %[[FILL:.+]] = linalg.fill ins(%[[F0]]{{.*}}outs(%[[INIT]]			// CHECK: %[[FILL:.+]] = linalg.fill ins(%[[F0]]{{.*}}outs(%[[INIT]]
	// CHECK: return %[[FILL]]			// CHECK: return %[[FILL]]
	func.func @fold_static_pad_fill() -> tensor<412x276xf32> {			func.func @fold_static_pad_fill() -> tensor<412x276xf32> {
	%f0 = arith.constant 0.0 : f32			%f0 = arith.constant 0.0 : f32
	%empty = tensor.empty() : tensor<400x273xf32>			%empty = tensor.empty() : tensor<400x273xf32>
	%fill = linalg.fill ins(%f0 : f32) outs(%empty : tensor<400x273xf32>) -> tensor<400x273xf32>			%fill = linalg.fill ins(%f0 : f32) outs(%empty : tensor<400x273xf32>)
	%pad = tensor.pad %fill low[4, 1] high[8, 2] {			%pad = tensor.pad %fill low[4, 1] high[8, 2] {
	^bb0(%arg1: index, %arg2: index):			^bb0(%arg1: index, %arg2: index):
	tensor.yield %f0 : f32			tensor.yield %f0 : f32
	} : tensor<400x273xf32> to tensor<412x276xf32>			} : tensor<400x273xf32> to tensor<412x276xf32>
	return %pad : tensor<412x276xf32>			return %pad : tensor<412x276xf32>
	}			}

	// -----			// -----
	Show All 14 Lines
	// CHECK: %[[S1:.+]] = affine.apply #[[MAP1]]()[%[[DIM1]]]			// CHECK: %[[S1:.+]] = affine.apply #[[MAP1]]()[%[[DIM1]]]
	// CHECK: %[[S2:.+]] = affine.apply #[[MAP2]]()[%[[HIGH2]]]			// CHECK: %[[S2:.+]] = affine.apply #[[MAP2]]()[%[[HIGH2]]]
	// CHECK: %[[S3:.+]] = affine.apply #[[MAP3]]()[%[[LOW3]], %[[HIGH3]]]			// CHECK: %[[S3:.+]] = affine.apply #[[MAP3]]()[%[[LOW3]], %[[HIGH3]]]
	// CHECK: %[[INIT:.+]] = tensor.empty(%[[S0]], %[[S1]], %[[S2]], %[[S3]]) : tensor<?x?x?x?xf32>			// CHECK: %[[INIT:.+]] = tensor.empty(%[[S0]], %[[S1]], %[[S2]], %[[S3]]) : tensor<?x?x?x?xf32>
	// CHECK: %[[FILL:.+]] = linalg.fill ins(%[[F0]]{{.*}}outs(%[[INIT]]			// CHECK: %[[FILL:.+]] = linalg.fill ins(%[[F0]]{{.*}}outs(%[[INIT]]
	// CHECK: return %[[FILL]]			// CHECK: return %[[FILL]]
	func.func @fold_dynamic_pad_fill(%empty: tensor<8x?x16x32xf32>, %low0: index, %low3: index, %high2: index, %high3: index) -> tensor<?x?x?x?xf32> {			func.func @fold_dynamic_pad_fill(%empty: tensor<8x?x16x32xf32>, %low0: index, %low3: index, %high2: index, %high3: index) -> tensor<?x?x?x?xf32> {
	%f0 = arith.constant 0.0 : f32			%f0 = arith.constant 0.0 : f32
	%fill = linalg.fill ins(%f0 : f32) outs(%empty : tensor<8x?x16x32xf32>) -> tensor<8x?x16x32xf32>			%fill = linalg.fill ins(%f0 : f32) outs(%empty : tensor<8x?x16x32xf32>)
	%pad = tensor.pad %fill low[%low0, 8, 7, %low3] high[1, 2, %high2, %high3] {			%pad = tensor.pad %fill low[%low0, 8, 7, %low3] high[1, 2, %high2, %high3] {
	^bb0(%arg1: index, %arg2: index, %arg3: index, %arg4: index):			^bb0(%arg1: index, %arg2: index, %arg3: index, %arg4: index):
	tensor.yield %f0 : f32			tensor.yield %f0 : f32
	} : tensor<8x?x16x32xf32> to tensor<?x?x?x?xf32>			} : tensor<8x?x16x32xf32> to tensor<?x?x?x?xf32>
	return %pad : tensor<?x?x?x?xf32>			return %pad : tensor<?x?x?x?xf32>
	}			}

	// -----			// -----

	// CHECK-LABEL: func @no_fold_pad_fill_value_mismatch			// CHECK-LABEL: func @no_fold_pad_fill_value_mismatch
	func.func @no_fold_pad_fill_value_mismatch() -> tensor<412x276xf32> {			func.func @no_fold_pad_fill_value_mismatch() -> tensor<412x276xf32> {
	%f0 = arith.constant 0.0 : f32			%f0 = arith.constant 0.0 : f32
	%f1 = arith.constant 1.0 : f32			%f1 = arith.constant 1.0 : f32
	%empty = tensor.empty() : tensor<400x273xf32>			%empty = tensor.empty() : tensor<400x273xf32>
	%fill = linalg.fill ins(%f0 : f32) outs(%empty : tensor<400x273xf32>) -> tensor<400x273xf32>			%fill = linalg.fill ins(%f0 : f32) outs(%empty : tensor<400x273xf32>)
	// CHECK: tensor.pad			// CHECK: tensor.pad
	%pad = tensor.pad %fill low[4, 1] high[8, 2] {			%pad = tensor.pad %fill low[4, 1] high[8, 2] {
	^bb0(%arg1: index, %arg2: index):			^bb0(%arg1: index, %arg2: index):
	tensor.yield %f1 : f32			tensor.yield %f1 : f32
	} : tensor<400x273xf32> to tensor<412x276xf32>			} : tensor<400x273xf32> to tensor<412x276xf32>
	return %pad : tensor<412x276xf32>			return %pad : tensor<412x276xf32>
	}			}

	▲ Show 20 Lines • Show All 170 Lines • ▼ Show 20 Lines
	func.func @insert_pad_into_fill(%input: tensor<?x?x?xf32>, %low0: index, %low1: index, %high1: index, %high2: index) -> tensor<8x384x384xf32> {			func.func @insert_pad_into_fill(%input: tensor<?x?x?xf32>, %low0: index, %low1: index, %high1: index, %high2: index) -> tensor<8x384x384xf32> {
	%f0 = arith.constant 0.0 : f32			%f0 = arith.constant 0.0 : f32
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%pad = tensor.pad %input low[%low0, %low1, %c0] high[%c0, %high1, %high2] {			%pad = tensor.pad %input low[%low0, %low1, %c0] high[%c0, %high1, %high2] {
	^bb0(%arg3: index, %arg4: index, %arg5: index):			^bb0(%arg3: index, %arg4: index, %arg5: index):
	tensor.yield %f0 : f32			tensor.yield %f0 : f32
	} : tensor<?x?x?xf32> to tensor<8x128x128xf32>			} : tensor<?x?x?xf32> to tensor<8x128x128xf32>
	%empty = tensor.empty() : tensor<8x384x384xf32>			%empty = tensor.empty() : tensor<8x384x384xf32>
	%fill = linalg.fill ins(%f0 : f32) outs(%empty : tensor<8x384x384xf32>) -> tensor<8x384x384xf32>			%fill = linalg.fill ins(%f0 : f32) outs(%empty : tensor<8x384x384xf32>)
	%0 = tensor.insert_slice %pad into %fill[0, 1, 2] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>			%0 = tensor.insert_slice %pad into %fill[0, 1, 2] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>
	return %0: tensor<8x384x384xf32>			return %0: tensor<8x384x384xf32>
	}			}

	// -----			// -----

	// CHECK-LABEL: func @multi_insert_pad_into_fill			// CHECK-LABEL: func @multi_insert_pad_into_fill
	// CHECK-SAME: (%[[INPUT:.+]]: tensor<7x123x124xf32>, %[[A:.+]]: tensor<8x128x128xf32>, %[[OFFSET:.+]]: index)			// CHECK-SAME: (%[[INPUT:.+]]: tensor<7x123x124xf32>, %[[A:.+]]: tensor<8x128x128xf32>, %[[OFFSET:.+]]: index)
	// CHECK: %[[FILL:.+]] = linalg.fill			// CHECK: %[[FILL:.+]] = linalg.fill
	// CHECK: %[[INSERT0:.+]] = tensor.insert_slice %[[A]] into %[[FILL]][%[[OFFSET]], 0, 0] [8, 128, 128] [1, 1, 1]			// CHECK: %[[INSERT0:.+]] = tensor.insert_slice %[[A]] into %[[FILL]][%[[OFFSET]], 0, 0] [8, 128, 128] [1, 1, 1]
	// CHECK: %[[INSERT1:.+]] = tensor.insert_slice %[[A]] into %[[INSERT0]][0, 128, %[[OFFSET]]] [8, 128, 128] [1, 1, 1]			// CHECK: %[[INSERT1:.+]] = tensor.insert_slice %[[A]] into %[[INSERT0]][0, 128, %[[OFFSET]]] [8, 128, 128] [1, 1, 1]
	// CHECK: tensor.insert_slice %[[INPUT]] into %[[INSERT1]][1, 2, 256] [7, 123, 124] [1, 1, 1]			// CHECK: tensor.insert_slice %[[INPUT]] into %[[INSERT1]][1, 2, 256] [7, 123, 124] [1, 1, 1]
	func.func @multi_insert_pad_into_fill(%input: tensor<7x123x124xf32>, %a: tensor<8x128x128xf32>, %offset: index) -> tensor<8x384x384xf32> {			func.func @multi_insert_pad_into_fill(%input: tensor<7x123x124xf32>, %a: tensor<8x128x128xf32>, %offset: index) -> tensor<8x384x384xf32> {
	%f0 = arith.constant 0.0 : f32			%f0 = arith.constant 0.0 : f32
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%pad = tensor.pad %input low[1, 2, 0] high[0, 3, 4] {			%pad = tensor.pad %input low[1, 2, 0] high[0, 3, 4] {
	^bb0(%arg3: index, %arg4: index, %arg5: index):			^bb0(%arg3: index, %arg4: index, %arg5: index):
	tensor.yield %f0 : f32			tensor.yield %f0 : f32
	} : tensor<7x123x124xf32> to tensor<8x128x128xf32>			} : tensor<7x123x124xf32> to tensor<8x128x128xf32>
	%empty = tensor.empty() : tensor<8x384x384xf32>			%empty = tensor.empty() : tensor<8x384x384xf32>
	%fill = linalg.fill ins(%f0 : f32) outs(%empty : tensor<8x384x384xf32>) -> tensor<8x384x384xf32>			%fill = linalg.fill ins(%f0 : f32) outs(%empty : tensor<8x384x384xf32>)
	%0 = tensor.insert_slice %a into %fill[%offset, 0, 0] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>			%0 = tensor.insert_slice %a into %fill[%offset, 0, 0] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>
	%1 = tensor.insert_slice %a into %0 [0, 128, %offset][8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>			%1 = tensor.insert_slice %a into %0 [0, 128, %offset][8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>
	%2 = tensor.insert_slice %pad into %1 [0, 0, 256] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>			%2 = tensor.insert_slice %pad into %1 [0, 0, 256] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>
	return %2: tensor<8x384x384xf32>			return %2: tensor<8x384x384xf32>
	}			}

	// -----			// -----

	// CHECK-LABEL: func @multi_insert_pad_into_fill_overlap			// CHECK-LABEL: func @multi_insert_pad_into_fill_overlap
	func.func @multi_insert_pad_into_fill_overlap(%input: tensor<7x123x124xf32>, %a: tensor<8x128x128xf32>, %offset: index) -> tensor<8x384x384xf32> {			func.func @multi_insert_pad_into_fill_overlap(%input: tensor<7x123x124xf32>, %a: tensor<8x128x128xf32>, %offset: index) -> tensor<8x384x384xf32> {
	%f0 = arith.constant 0.0 : f32			%f0 = arith.constant 0.0 : f32
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	// CHECK: tensor.pad			// CHECK: tensor.pad
	%pad = tensor.pad %input low[1, 2, 0] high[0, 3, 4] {			%pad = tensor.pad %input low[1, 2, 0] high[0, 3, 4] {
	^bb0(%arg3: index, %arg4: index, %arg5: index):			^bb0(%arg3: index, %arg4: index, %arg5: index):
	tensor.yield %f0 : f32			tensor.yield %f0 : f32
	} : tensor<7x123x124xf32> to tensor<8x128x128xf32>			} : tensor<7x123x124xf32> to tensor<8x128x128xf32>
	%empty = tensor.empty() : tensor<8x384x384xf32>			%empty = tensor.empty() : tensor<8x384x384xf32>
	%fill = linalg.fill ins(%f0 : f32) outs(%empty : tensor<8x384x384xf32>) -> tensor<8x384x384xf32>			%fill = linalg.fill ins(%f0 : f32) outs(%empty : tensor<8x384x384xf32>)
	%0 = tensor.insert_slice %a into %fill[%offset, 0, 0] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>			%0 = tensor.insert_slice %a into %fill[%offset, 0, 0] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>
	%1 = tensor.insert_slice %a into %0 [0, 0, 129] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>			%1 = tensor.insert_slice %a into %0 [0, 0, 129] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>
	// Range overlap with %1 at dim#3			// Range overlap with %1 at dim#3
	%2 = tensor.insert_slice %pad into %1 [0, 0, 256] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>			%2 = tensor.insert_slice %pad into %1 [0, 0, 256] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>
	return %2: tensor<8x384x384xf32>			return %2: tensor<8x384x384xf32>
	}			}

	// -----			// -----

	// CHECK-LABEL: func @multi_insert_pad_into_fill_overlap			// CHECK-LABEL: func @multi_insert_pad_into_fill_overlap
	func.func @multi_insert_pad_into_fill_overlap(%input: tensor<7x123x124xf32>, %a: tensor<8x128x128xf32>, %offset: index) -> tensor<8x384x384xf32> {			func.func @multi_insert_pad_into_fill_overlap(%input: tensor<7x123x124xf32>, %a: tensor<8x128x128xf32>, %offset: index) -> tensor<8x384x384xf32> {
	%f0 = arith.constant 0.0 : f32			%f0 = arith.constant 0.0 : f32
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	// CHECK: tensor.pad			// CHECK: tensor.pad
	%pad = tensor.pad %input low[1, 2, 0] high[0, 3, 4] {			%pad = tensor.pad %input low[1, 2, 0] high[0, 3, 4] {
	^bb0(%arg3: index, %arg4: index, %arg5: index):			^bb0(%arg3: index, %arg4: index, %arg5: index):
	tensor.yield %f0 : f32			tensor.yield %f0 : f32
	} : tensor<7x123x124xf32> to tensor<8x128x128xf32>			} : tensor<7x123x124xf32> to tensor<8x128x128xf32>
	%empty = tensor.empty() : tensor<8x384x384xf32>			%empty = tensor.empty() : tensor<8x384x384xf32>
	%fill = linalg.fill ins(%f0 : f32) outs(%empty : tensor<8x384x384xf32>) -> tensor<8x384x384xf32>			%fill = linalg.fill ins(%f0 : f32) outs(%empty : tensor<8x384x384xf32>)
	%0 = tensor.insert_slice %a into %fill[0, 0, %offset] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>			%0 = tensor.insert_slice %a into %fill[0, 0, %offset] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>
	%1 = tensor.insert_slice %a into %0 [0, 128, 255] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>			%1 = tensor.insert_slice %a into %0 [0, 128, 255] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>
	// Range overlap with %0 at dim#3			// Range overlap with %0 at dim#3
	%2 = tensor.insert_slice %pad into %1 [0, 0, 256] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>			%2 = tensor.insert_slice %pad into %1 [0, 0, 256] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>
	return %2: tensor<8x384x384xf32>			return %2: tensor<8x384x384xf32>
	}			}

	// -----			// -----

	// CHECK-LABEL: func @multi_insert_pad_into_fill			// CHECK-LABEL: func @multi_insert_pad_into_fill
	func.func @multi_insert_pad_into_fill(%input: tensor<7x123x124xf32>, %a: tensor<8x128x128xf32>, %offset: index) -> tensor<8x384x384xf32> {			func.func @multi_insert_pad_into_fill(%input: tensor<7x123x124xf32>, %a: tensor<8x128x128xf32>, %offset: index) -> tensor<8x384x384xf32> {
	%f0 = arith.constant 0.0 : f32			%f0 = arith.constant 0.0 : f32
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	// CHECK-NOT: tensor.pad			// CHECK-NOT: tensor.pad
	%pad = tensor.pad %input low[1, 2, 0] high[0, 3, 4] {			%pad = tensor.pad %input low[1, 2, 0] high[0, 3, 4] {
	^bb0(%arg3: index, %arg4: index, %arg5: index):			^bb0(%arg3: index, %arg4: index, %arg5: index):
	tensor.yield %f0 : f32			tensor.yield %f0 : f32
	} : tensor<7x123x124xf32> to tensor<8x128x128xf32>			} : tensor<7x123x124xf32> to tensor<8x128x128xf32>
	%empty = tensor.empty() : tensor<8x384x384xf32>			%empty = tensor.empty() : tensor<8x384x384xf32>
	%fill = linalg.fill ins(%f0 : f32) outs(%empty : tensor<8x384x384xf32>) -> tensor<8x384x384xf32>			%fill = linalg.fill ins(%f0 : f32) outs(%empty : tensor<8x384x384xf32>)
	// Overlap btween %0 and %1 is fine but not with %2 is fine.			// Overlap btween %0 and %1 is fine but not with %2 is fine.
	// CHECK-COUNT-3: tensor.insert_slice			// CHECK-COUNT-3: tensor.insert_slice
	%0 = tensor.insert_slice %a into %fill[0, 0, %offset] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>			%0 = tensor.insert_slice %a into %fill[0, 0, %offset] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>
	%1 = tensor.insert_slice %a into %0 [0, 1, %offset] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>			%1 = tensor.insert_slice %a into %0 [0, 1, %offset] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>
	%2 = tensor.insert_slice %pad into %1 [0, 256, 256] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>			%2 = tensor.insert_slice %pad into %1 [0, 256, 256] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>
	return %2: tensor<8x384x384xf32>			return %2: tensor<8x384x384xf32>
	}			}

	// -----			// -----

	// CHECK-LABEL: func @multi_insert_pad_into_fill_mismatch			// CHECK-LABEL: func @multi_insert_pad_into_fill_mismatch
	func.func @multi_insert_pad_into_fill_mismatch(%input: tensor<7x123x124xf32>, %a: tensor<8x128x128xf32>, %offset: index) -> tensor<8x384x384xf32> {			func.func @multi_insert_pad_into_fill_mismatch(%input: tensor<7x123x124xf32>, %a: tensor<8x128x128xf32>, %offset: index) -> tensor<8x384x384xf32> {
	%f0 = arith.constant 0.0 : f32			%f0 = arith.constant 0.0 : f32
	%f1 = arith.constant 1.0 : f32			%f1 = arith.constant 1.0 : f32
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	// CHECK: tensor.pad			// CHECK: tensor.pad
	%pad = tensor.pad %input low[1, 2, 0] high[0, 3, 4] {			%pad = tensor.pad %input low[1, 2, 0] high[0, 3, 4] {
	^bb0(%arg3: index, %arg4: index, %arg5: index):			^bb0(%arg3: index, %arg4: index, %arg5: index):
	tensor.yield %f0 : f32			tensor.yield %f0 : f32
	} : tensor<7x123x124xf32> to tensor<8x128x128xf32>			} : tensor<7x123x124xf32> to tensor<8x128x128xf32>
	%empty = tensor.empty() : tensor<8x384x384xf32>			%empty = tensor.empty() : tensor<8x384x384xf32>
	// Different filling value than padding value.			// Different filling value than padding value.
	%fill = linalg.fill ins(%f1 : f32) outs(%empty : tensor<8x384x384xf32>) -> tensor<8x384x384xf32>			%fill = linalg.fill ins(%f1 : f32) outs(%empty : tensor<8x384x384xf32>)
	%0 = tensor.insert_slice %a into %fill[%offset, 0, 0] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>			%0 = tensor.insert_slice %a into %fill[%offset, 0, 0] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>
	%1 = tensor.insert_slice %a into %0 [0, 128, %offset][8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>			%1 = tensor.insert_slice %a into %0 [0, 128, %offset][8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>
	%2 = tensor.insert_slice %pad into %1 [0, 0, 256] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>			%2 = tensor.insert_slice %pad into %1 [0, 0, 256] [8, 128, 128] [1, 1, 1] : tensor<8x128x128xf32> into tensor<8x384x384xf32>
	return %2: tensor<8x384x384xf32>			return %2: tensor<8x384x384xf32>
	}			}

	// -----			// -----

	func.func @fold_linalgop_with_cast_consumer(%arg0 : tensor<?x?xf32>, %arg1 : tensor<?x?xf32>,			func.func @fold_linalgop_with_cast_consumer(%arg0 : tensor<?x?xf32>, %arg1 : tensor<?x?xf32>,
	%arg2 : tensor<?x?xf32>) -> (tensor<4x8xf32>, tensor<?x?xf32>) {			%arg2 : tensor<?x?xf32>) -> (tensor<4x8xf32>, tensor<?x?xf32>) {
	%0 = linalg.matmul ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)			%0 = linalg.matmul ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)
	outs(%arg2 : tensor<?x?xf32>) -> tensor<?x?xf32>			outs(%arg2 : tensor<?x?xf32>)
	%1 = tensor.cast %0 : tensor<?x?xf32> to tensor<4x8xf32>			%1 = tensor.cast %0 : tensor<?x?xf32> to tensor<4x8xf32>
	return %1, %0 : tensor<4x8xf32>, tensor<?x?xf32>			return %1, %0 : tensor<4x8xf32>, tensor<?x?xf32>
	}			}
	// CHECK: func @fold_linalgop_with_cast_consumer(			// CHECK: func @fold_linalgop_with_cast_consumer(
	// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]: tensor<?x?xf32>			// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]: tensor<?x?xf32>
	// CHECK-SAME: %[[ARG1:[a-zA-Z0-9]+]]: tensor<?x?xf32>			// CHECK-SAME: %[[ARG1:[a-zA-Z0-9]+]]: tensor<?x?xf32>
	// CHECK-SAME: %[[ARG2:[a-zA-Z0-9]+]]: tensor<?x?xf32>)			// CHECK-SAME: %[[ARG2:[a-zA-Z0-9]+]]: tensor<?x?xf32>)
	// CHECK-DAG: %[[LHS_CAST:.+]] = tensor.cast %[[ARG0]] : tensor<?x?xf32> to tensor<4x?xf32>			// CHECK-DAG: %[[LHS_CAST:.+]] = tensor.cast %[[ARG0]] : tensor<?x?xf32> to tensor<4x?xf32>
	// CHECK-DAG: %[[RHS_CAST:.+]] = tensor.cast %[[ARG1]] : tensor<?x?xf32> to tensor<?x8xf32>			// CHECK-DAG: %[[RHS_CAST:.+]] = tensor.cast %[[ARG1]] : tensor<?x?xf32> to tensor<?x8xf32>
	// CHECK-DAG: %[[OUT_CAST:.+]] = tensor.cast %[[ARG2]] : tensor<?x?xf32> to tensor<4x8xf32>			// CHECK-DAG: %[[OUT_CAST:.+]] = tensor.cast %[[ARG2]] : tensor<?x?xf32> to tensor<4x8xf32>
	// CHECK: %[[MATMUL:.+]] = linalg.matmul			// CHECK: %[[MATMUL:.+]] = linalg.matmul
	// CHECK-SAME: ins(%[[LHS_CAST]], %[[RHS_CAST]] :			// CHECK-SAME: ins(%[[LHS_CAST]], %[[RHS_CAST]] :
	// CHECK-SAME: outs(%[[OUT_CAST]] :			// CHECK-SAME: outs(%[[OUT_CAST]] :
	// CHECK: %[[RESULT_CAST:.+]] = tensor.cast %[[MATMUL]]			// CHECK: %[[RESULT_CAST:.+]] = tensor.cast %[[MATMUL]]
	// CHECK: return %[[MATMUL]], %[[RESULT_CAST]]			// CHECK: return %[[MATMUL]], %[[RESULT_CAST]]

	// -----			// -----

	func.func private @some_use(%0 : tensor<4x8xf32>)			func.func private @some_use(%0 : tensor<4x8xf32>)

	func.func @linalgop_with_cond_cast_consumer(%arg0 : tensor<?x?xf32>, %arg1 : tensor<?x?xf32>,			func.func @linalgop_with_cond_cast_consumer(%arg0 : tensor<?x?xf32>, %arg1 : tensor<?x?xf32>,
	%arg2 : tensor<?x?xf32>, %arg3 : i1) -> tensor<?x?xf32> {			%arg2 : tensor<?x?xf32>, %arg3 : i1) -> tensor<?x?xf32> {
	%0 = linalg.matmul ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)			%0 = linalg.matmul ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)
	outs(%arg2 : tensor<?x?xf32>) -> tensor<?x?xf32>			outs(%arg2 : tensor<?x?xf32>)
	scf.if %arg3 {			scf.if %arg3 {
	%1 = tensor.cast %0 : tensor<?x?xf32> to tensor<4x8xf32>			%1 = tensor.cast %0 : tensor<?x?xf32> to tensor<4x8xf32>
	func.call @some_use(%1) : (tensor<4x8xf32>) -> ()			func.call @some_use(%1) : (tensor<4x8xf32>) -> ()
	}			}
	return %0 : tensor<?x?xf32>			return %0 : tensor<?x?xf32>
	}			}

	// Check conditionally reachable cast is not folded into producer.			// Check conditionally reachable cast is not folded into producer.
	// CHECK-LABEL: func @linalgop_with_cond_cast_consumer			// CHECK-LABEL: func @linalgop_with_cond_cast_consumer
	// CHECK-SAME: (%[[ARG0:.]]: tensor<?x?xf32>, %[[ARG1:.]]: tensor<?x?xf32>, %[[ARG2:.]]: tensor<?x?xf32>, %[[ARG3:.]]: i1)			// CHECK-SAME: (%[[ARG0:.]]: tensor<?x?xf32>, %[[ARG1:.]]: tensor<?x?xf32>, %[[ARG2:.]]: tensor<?x?xf32>, %[[ARG3:.]]: i1)
	// CHECK: %[[RES:.*]] = linalg.matmul ins(%[[ARG0]], %[[ARG1]] : tensor<?x?xf32>, tensor<?x?xf32>)			// CHECK: %[[RES:.*]] = linalg.matmul ins(%[[ARG0]], %[[ARG1]] : tensor<?x?xf32>, tensor<?x?xf32>)
	// CHECK-SAME: outs(%[[ARG2]] : tensor<?x?xf32>) -> tensor<?x?xf32>			// CHECK-SAME: outs(%[[ARG2]] : tensor<?x?xf32>)
	// CHECK: scf.if %[[ARG3]] {			// CHECK: scf.if %[[ARG3]] {
	// CHECK: %[[CAST:.*]] = tensor.cast %[[RES]] : tensor<?x?xf32> to tensor<4x8xf32>			// CHECK: %[[CAST:.*]] = tensor.cast %[[RES]] : tensor<?x?xf32> to tensor<4x8xf32>
	// CHECK: func.call @some_use(%[[CAST]]) : (tensor<4x8xf32>) -> ()			// CHECK: func.call @some_use(%[[CAST]]) : (tensor<4x8xf32>) -> ()
	// CHECK: }			// CHECK: }
	// CHECK: return %[[RES]] : tensor<?x?xf32>			// CHECK: return %[[RES]] : tensor<?x?xf32>


	// -----			// -----

	func.func @fold_conv_op_with_cast_consumer(%arg0 : tensor<?x?x?x?xf32>,			func.func @fold_conv_op_with_cast_consumer(%arg0 : tensor<?x?x?x?xf32>,
	%arg1 : tensor<?x?x?x?xf32>, %arg2 : tensor<?x?x?x?xf32>) ->			%arg1 : tensor<?x?x?x?xf32>, %arg2 : tensor<?x?x?x?xf32>) ->
	(tensor<4x8x12x16xf32>, tensor<?x?x?x?xf32>) {			(tensor<4x8x12x16xf32>, tensor<?x?x?x?xf32>) {
	%0 = linalg.conv_2d_nchw_fchw ins(%arg0, %arg1 : tensor<?x?x?x?xf32>, tensor<?x?x?x?xf32>)			%0 = linalg.conv_2d_nchw_fchw ins(%arg0, %arg1 : tensor<?x?x?x?xf32>, tensor<?x?x?x?xf32>)
	outs(%arg2 : tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32>			outs(%arg2 : tensor<?x?x?x?xf32>)
	%1 = tensor.cast %0 : tensor<?x?x?x?xf32> to tensor<4x8x12x16xf32>			%1 = tensor.cast %0 : tensor<?x?x?x?xf32> to tensor<4x8x12x16xf32>
	return %1, %0 : tensor<4x8x12x16xf32>, tensor<?x?x?x?xf32>			return %1, %0 : tensor<4x8x12x16xf32>, tensor<?x?x?x?xf32>
	}			}
	// CHECK: func @fold_conv_op_with_cast_consumer(			// CHECK: func @fold_conv_op_with_cast_consumer(
	// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]: tensor<?x?x?x?xf32>			// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]: tensor<?x?x?x?xf32>
	// CHECK-SAME: %[[ARG1:[a-zA-Z0-9]+]]: tensor<?x?x?x?xf32>			// CHECK-SAME: %[[ARG1:[a-zA-Z0-9]+]]: tensor<?x?x?x?xf32>
	// CHECK-SAME: %[[ARG2:[a-zA-Z0-9]+]]: tensor<?x?x?x?xf32>)			// CHECK-SAME: %[[ARG2:[a-zA-Z0-9]+]]: tensor<?x?x?x?xf32>)
	// CHECK: %[[OUT_CAST:.+]] = tensor.cast %[[ARG2]] : tensor<?x?x?x?xf32> to tensor<4x8x12x16xf32>			// CHECK: %[[OUT_CAST:.+]] = tensor.cast %[[ARG2]] : tensor<?x?x?x?xf32> to tensor<4x8x12x16xf32>
	▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/drop-unit-extent-dims.mlir

	Show First 20 Lines • Show All 307 Lines • ▼ Show 20 Lines
	// CHECK: %[[RESULT:.+]] = linalg.generic			// CHECK: %[[RESULT:.+]] = linalg.generic
	// CHECK: return %[[RESULT]]			// CHECK: return %[[RESULT]]

	// -----			// -----

	func.func @fold_unit_dim_for_empty_tensor(%input: tensor<1x1000xf32>) -> tensor<1xf32> {			func.func @fold_unit_dim_for_empty_tensor(%input: tensor<1x1000xf32>) -> tensor<1xf32> {
	%cst = arith.constant 0.0 : f32			%cst = arith.constant 0.0 : f32
	%init = tensor.empty() : tensor<1xf32>			%init = tensor.empty() : tensor<1xf32>
	%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1xf32>) -> tensor<1xf32>			%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1xf32>)
	%add = linalg.generic {			%add = linalg.generic {
	indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0)>],			indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0)>],
	iterator_types = ["parallel", "reduction"]}			iterator_types = ["parallel", "reduction"]}
	ins(%input : tensor<1x1000xf32>)outs(%fill : tensor<1xf32>) {			ins(%input : tensor<1x1000xf32>)outs(%fill : tensor<1xf32>) {
	^bb0(%arg1: f32, %arg2: f32):			^bb0(%arg1: f32, %arg2: f32):
	%1823 = arith.addf %arg1, %arg2 : f32			%1823 = arith.addf %arg1, %arg2 : f32
	linalg.yield %1823 : f32			linalg.yield %1823 : f32
	} -> tensor<1xf32>			} -> tensor<1xf32>
	return %add : tensor<1xf32>			return %add : tensor<1xf32>
	}			}


	// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0) -> (d0)>			// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0) -> (d0)>
	// CHECK-DAG: #[[MAP2:.+]] = affine_map<(d0) -> ()>			// CHECK-DAG: #[[MAP2:.+]] = affine_map<(d0) -> ()>

	// CHECK: func @fold_unit_dim_for_empty_tensor			// CHECK: func @fold_unit_dim_for_empty_tensor


	// CHECK: %[[INPUT_RESHAPE:.+]] = tensor.collapse_shape %{{.+}} {{\[}}[0, 1]] : tensor<1x1000xf32> into tensor<1000xf32>			// CHECK: %[[INPUT_RESHAPE:.+]] = tensor.collapse_shape %{{.+}} {{\[}}[0, 1]] : tensor<1x1000xf32> into tensor<1000xf32>
	// CHECK: %[[INIT:.+]] = tensor.empty() : tensor<f32>			// CHECK: %[[INIT:.+]] = tensor.empty() : tensor<f32>
	// CHECK: %[[FILL:.+]] = linalg.fill ins(%cst : f32) outs(%[[INIT]] : tensor<f32>) -> tensor<f32>			// CHECK: %[[FILL:.+]] = linalg.fill ins(%cst : f32) outs(%[[INIT]] : tensor<f32>)
	// CHECK: %[[GENERIC:.+]] = linalg.generic			// CHECK: %[[GENERIC:.+]] = linalg.generic
	// CHECK-SAME: indexing_maps = [#[[MAP1]], #[[MAP2]]]			// CHECK-SAME: indexing_maps = [#[[MAP1]], #[[MAP2]]]
	// CHECK-SAME: iterator_types = ["reduction"]			// CHECK-SAME: iterator_types = ["reduction"]
	// CHECK-SAME: ins(%[[INPUT_RESHAPE]] : tensor<1000xf32>)			// CHECK-SAME: ins(%[[INPUT_RESHAPE]] : tensor<1000xf32>)
	// CHECK-SAME: outs(%[[FILL]] : tensor<f32>)			// CHECK-SAME: outs(%[[FILL]] : tensor<f32>)
	// CHECK: %[[GENERIC_RESHAPE:.+]] = tensor.expand_shape %[[GENERIC]] [] : tensor<f32> into tensor<1xf32>			// CHECK: %[[GENERIC_RESHAPE:.+]] = tensor.expand_shape %[[GENERIC]] [] : tensor<f32> into tensor<1xf32>
	// CHECK: return %[[GENERIC_RESHAPE:.+]] : tensor<1xf32>			// CHECK: return %[[GENERIC_RESHAPE:.+]] : tensor<1xf32>

	Show All 27 Lines

	// -----			// -----

	func.func @unit_dim_for_reduction(%arg0: tensor<1x?x1x?xf32>) -> tensor<1x?xf32> {			func.func @unit_dim_for_reduction(%arg0: tensor<1x?x1x?xf32>) -> tensor<1x?xf32> {
	%cst = arith.constant 1.000000e+00 : f32			%cst = arith.constant 1.000000e+00 : f32
	%c3 = arith.constant 3 : index			%c3 = arith.constant 3 : index
	%0 = tensor.dim %arg0, %c3 : tensor<1x?x1x?xf32>			%0 = tensor.dim %arg0, %c3 : tensor<1x?x1x?xf32>
	%1 = tensor.empty(%0) : tensor<1x?xf32>			%1 = tensor.empty(%0) : tensor<1x?xf32>
	%2 = linalg.fill ins(%cst : f32) outs(%1 : tensor<1x?xf32>) -> tensor<1x?xf32>			%2 = linalg.fill ins(%cst : f32) outs(%1 : tensor<1x?xf32>)
	%3 = linalg.generic {			%3 = linalg.generic {
	indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>,			indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>,
	affine_map<(d0, d1, d2, d3) -> (d0, d1)>],			affine_map<(d0, d1, d2, d3) -> (d0, d1)>],
	iterator_types = ["parallel", "parallel", "reduction", "reduction"]}			iterator_types = ["parallel", "parallel", "reduction", "reduction"]}
	ins(%arg0 : tensor<1x?x1x?xf32>)			ins(%arg0 : tensor<1x?x1x?xf32>)
	outs(%2 : tensor<1x?xf32>) {			outs(%2 : tensor<1x?xf32>) {
	^bb0(%arg1: f32, %arg2: f32):			^bb0(%arg1: f32, %arg2: f32):
	%4 = arith.addf %arg1, %arg2 : f32			%4 = arith.addf %arg1, %arg2 : f32
	Show All 17 Lines
	// CHECK: return %[[RESULT_RESHAPE]]			// CHECK: return %[[RESULT_RESHAPE]]

	// -----			// -----

	func.func @unit_dim_for_both_reduction(%arg0: tensor<1x?x1x1xf32>) -> tensor<1x1xf32> {			func.func @unit_dim_for_both_reduction(%arg0: tensor<1x?x1x1xf32>) -> tensor<1x1xf32> {
	%cst = arith.constant 1.000000e+00 : f32			%cst = arith.constant 1.000000e+00 : f32
	%c3 = arith.constant 3 : index			%c3 = arith.constant 3 : index
	%1 = tensor.empty() : tensor<1x1xf32>			%1 = tensor.empty() : tensor<1x1xf32>
	%2 = linalg.fill ins(%cst : f32) outs(%1 : tensor<1x1xf32>) -> tensor<1x1xf32>			%2 = linalg.fill ins(%cst : f32) outs(%1 : tensor<1x1xf32>)
	%3 = linalg.generic {			%3 = linalg.generic {
	indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>,			indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>,
	affine_map<(d0, d1, d2, d3) -> (d0, d1)>],			affine_map<(d0, d1, d2, d3) -> (d0, d1)>],
	iterator_types = ["parallel", "parallel", "reduction", "reduction"]}			iterator_types = ["parallel", "parallel", "reduction", "reduction"]}
	ins(%arg0 : tensor<1x?x1x1xf32>)			ins(%arg0 : tensor<1x?x1x1xf32>)
	outs(%2 : tensor<1x1xf32>) {			outs(%2 : tensor<1x1xf32>) {
	^bb0(%arg1: f32, %arg2: f32):			^bb0(%arg1: f32, %arg2: f32):
	%4 = arith.addf %arg1, %arg2 : f32			%4 = arith.addf %arg1, %arg2 : f32
	Show All 18 Lines

	// -----			// -----

	func.func @unit_dim_for_reduction_inner(%arg0: tensor<?x1x?x1xf32>) -> tensor<?x1xf32> {			func.func @unit_dim_for_reduction_inner(%arg0: tensor<?x1x?x1xf32>) -> tensor<?x1xf32> {
	%cst = arith.constant 1.000000e+00 : f32			%cst = arith.constant 1.000000e+00 : f32
	%c2 = arith.constant 2 : index			%c2 = arith.constant 2 : index
	%0 = tensor.dim %arg0, %c2 : tensor<?x1x?x1xf32>			%0 = tensor.dim %arg0, %c2 : tensor<?x1x?x1xf32>
	%1 = tensor.empty(%0) : tensor<?x1xf32>			%1 = tensor.empty(%0) : tensor<?x1xf32>
	%2 = linalg.fill ins(%cst : f32) outs(%1 : tensor<?x1xf32>) -> tensor<?x1xf32>			%2 = linalg.fill ins(%cst : f32) outs(%1 : tensor<?x1xf32>)
	%3 = linalg.generic {			%3 = linalg.generic {
	indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>,			indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>,
	affine_map<(d0, d1, d2, d3) -> (d0, d1)>],			affine_map<(d0, d1, d2, d3) -> (d0, d1)>],
	iterator_types = ["parallel", "parallel", "reduction", "reduction"]}			iterator_types = ["parallel", "parallel", "reduction", "reduction"]}
	ins(%arg0 : tensor<?x1x?x1xf32>)			ins(%arg0 : tensor<?x1x?x1xf32>)
	outs(%2 : tensor<?x1xf32>) {			outs(%2 : tensor<?x1xf32>) {
	^bb0(%arg1: f32, %arg2: f32):			^bb0(%arg1: f32, %arg2: f32):
	%4 = arith.addf %arg1, %arg2 : f32			%4 = arith.addf %arg1, %arg2 : f32
	▲ Show 20 Lines • Show All 418 Lines • ▼ Show 20 Lines

	func.func @reduce_dispatch_0() -> tensor<4x2xf32> {			func.func @reduce_dispatch_0() -> tensor<4x2xf32> {
	%c2 = arith.constant 2 : index			%c2 = arith.constant 2 : index
	%c4 = arith.constant 4 : index			%c4 = arith.constant 4 : index
	%cst = arith.constant 0.000000e+00 : f32			%cst = arith.constant 0.000000e+00 : f32
	%0 = tensor.empty() : tensor<4x2xf32>			%0 = tensor.empty() : tensor<4x2xf32>
	%res = scf.foreach_thread (%arg0, %arg1) in (%c4, %c2) shared_outs(%o = %0) -> (tensor<4x2xf32>) {			%res = scf.foreach_thread (%arg0, %arg1) in (%c4, %c2) shared_outs(%o = %0) -> (tensor<4x2xf32>) {
	%1 = tensor.empty() : tensor<1x1xf32>			%1 = tensor.empty() : tensor<1x1xf32>
	%2 = linalg.fill ins(%cst : f32) outs(%1 : tensor<1x1xf32>) -> tensor<1x1xf32>			%2 = linalg.fill ins(%cst : f32) outs(%1 : tensor<1x1xf32>)
	scf.foreach_thread.perform_concurrently {			scf.foreach_thread.perform_concurrently {
	// CHECK: tensor.parallel_insert_slice %{{[0-9a-z]}} into %{{[0-9a-z]}}			// CHECK: tensor.parallel_insert_slice %{{[0-9a-z]}} into %{{[0-9a-z]}}
	// CHECK-SAME: [%{{.}}, %{{.}}] [1, 1] [1, 1] : tensor<f32> into tensor<4x2xf32>			// CHECK-SAME: [%{{.}}, %{{.}}] [1, 1] [1, 1] : tensor<f32> into tensor<4x2xf32>
	tensor.parallel_insert_slice %2 into %o[%arg0, %arg1] [1, 1] [1, 1] :			tensor.parallel_insert_slice %2 into %o[%arg0, %arg1] [1, 1] [1, 1] :
	tensor<1x1xf32> into tensor<4x2xf32>			tensor<1x1xf32> into tensor<4x2xf32>
	}			}
	}			}
	return %res: tensor<4x2xf32>			return %res: tensor<4x2xf32>
	Show All 32 Lines

mlir/test/Dialect/Linalg/erase-unused-operands-and-results.mlir

	Show First 20 Lines • Show All 208 Lines • ▼ Show 20 Lines
	// CHECK: return %[[GENERIC]]#1			// CHECK: return %[[GENERIC]]#1

	// -----			// -----

	// Do not remove operand needed for loop dim.			// Do not remove operand needed for loop dim.
	func.func @loop_dim_operand(%arg0 : tensor<?xf32>) -> tensor<i32> {			func.func @loop_dim_operand(%arg0 : tensor<?xf32>) -> tensor<i32> {
	%cst = arith.constant 0 : i32			%cst = arith.constant 0 : i32
	%init = tensor.empty() : tensor<i32>			%init = tensor.empty() : tensor<i32>
	%fill = linalg.fill ins(%cst : i32) outs(%init : tensor<i32>) -> tensor<i32>			%fill = linalg.fill ins(%cst : i32) outs(%init : tensor<i32>)
	%0 = linalg.generic {			%0 = linalg.generic {
	indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> ()>],			indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> ()>],
	iterator_types = ["reduction"]}			iterator_types = ["reduction"]}
	ins(%arg0 : tensor<?xf32>) outs(%fill : tensor<i32>) {			ins(%arg0 : tensor<?xf32>) outs(%fill : tensor<i32>) {
	^bb0(%b0: f32, %b1: i32):			^bb0(%b0: f32, %b1: i32):
	%1 = linalg.index 0 : index			%1 = linalg.index 0 : index
	%2 = arith.index_cast %1 : index to i32			%2 = arith.index_cast %1 : index to i32
	%3 = arith.addi %b1, %2 : i32			%3 = arith.addi %b1, %2 : i32
	linalg.yield %3 : i32			linalg.yield %3 : i32
	} -> tensor<i32>			} -> tensor<i32>
	return %0 : tensor<i32>			return %0 : tensor<i32>
	}			}
	// CHECK: func @loop_dim_operand(			// CHECK: func @loop_dim_operand(
	// CHECK-SAME: %[[ARG0:.+]]: tensor<?xf32>			// CHECK-SAME: %[[ARG0:.+]]: tensor<?xf32>
	// CHECK: linalg.generic			// CHECK: linalg.generic
	// CHECK-SAME: ins(%[[ARG0]] :			// CHECK-SAME: ins(%[[ARG0]] :

	// -----			// -----

	// Do not remove outs operand needed for loop bound computation.			// Do not remove outs operand needed for loop bound computation.
	func.func @loop_dim_outs_operand(%arg0 : index) -> tensor<i32> {			func.func @loop_dim_outs_operand(%arg0 : index) -> tensor<i32> {
	%cst = arith.constant 0 : i32			%cst = arith.constant 0 : i32
	%init1 = tensor.empty(%arg0) : tensor<?xi32>			%init1 = tensor.empty(%arg0) : tensor<?xi32>
	%init = tensor.empty() : tensor<i32>			%init = tensor.empty() : tensor<i32>
	%fill = linalg.fill ins(%cst : i32) outs(%init : tensor<i32>) -> tensor<i32>			%fill = linalg.fill ins(%cst : i32) outs(%init : tensor<i32>)
	%0:2 = linalg.generic {			%0:2 = linalg.generic {
	indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> ()>],			indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> ()>],
	iterator_types = ["parallel"]}			iterator_types = ["parallel"]}
	outs(%init1, %fill : tensor<?xi32>, tensor<i32>) {			outs(%init1, %fill : tensor<?xi32>, tensor<i32>) {
	^bb0(%b0: i32, %b1: i32):			^bb0(%b0: i32, %b1: i32):
	%1 = linalg.index 0 : index			%1 = linalg.index 0 : index
	%2 = arith.index_cast %1 : index to i32			%2 = arith.index_cast %1 : index to i32
	%3 = arith.addi %b1, %2 : i32			%3 = arith.addi %b1, %2 : i32
	▲ Show 20 Lines • Show All 272 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir

Show First 20 Lines • Show All 928 Lines • ▼ Show 20 Lines	func.func @no_fusion_missing_reduction_shape(%arg0: tensor<f32>, %arg1: index) -> tensor<?xf32> {
%5 = linalg.generic {		%5 = linalg.generic {
indexing_maps = [#map0, #map1],		indexing_maps = [#map0, #map1],
iterator_types = ["parallel", "parallel"]		iterator_types = ["parallel", "parallel"]
} ins(%arg0 : tensor<f32>) outs(%4 : tensor<?x?xf32>) {		} ins(%arg0 : tensor<f32>) outs(%4 : tensor<?x?xf32>) {
^bb0(%arg2: f32, %arg3: f32):		^bb0(%arg2: f32, %arg3: f32):
linalg.yield %arg2 : f32		linalg.yield %arg2 : f32
} -> tensor<?x?xf32>		} -> tensor<?x?xf32>
%6 = tensor.empty(%arg1) : tensor<?xf32>		%6 = tensor.empty(%arg1) : tensor<?xf32>
%7 = linalg.fill ins(%cst : f32) outs(%6 : tensor<?xf32>) -> tensor<?xf32>		%7 = linalg.fill ins(%cst : f32) outs(%6 : tensor<?xf32>)
%8 = linalg.generic {		%8 = linalg.generic {
indexing_maps = [#map2, #map3],		indexing_maps = [#map2, #map3],
iterator_types = ["parallel", "reduction"]		iterator_types = ["parallel", "reduction"]
} ins(%5 : tensor<?x?xf32>) outs(%7 : tensor<?xf32>) {		} ins(%5 : tensor<?x?xf32>) outs(%7 : tensor<?xf32>) {
^bb0(%arg2: f32, %arg3: f32):		^bb0(%arg2: f32, %arg3: f32):
%9 = arith.maxf %arg2, %arg3 : f32		%9 = arith.maxf %arg2, %arg3 : f32
linalg.yield %9 : f32		linalg.yield %9 : f32
} -> tensor<?xf32>		} -> tensor<?xf32>
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
// CHECK-SAME: ins(%[[ARG0]] : tensor<?xf32>)		// CHECK-SAME: ins(%[[ARG0]] : tensor<?xf32>)
// CHECK-SAME: outs({{.*}} : tensor<?xf32>) {		// CHECK-SAME: outs({{.*}} : tensor<?xf32>) {
#map0 = affine_map<(d0) -> (d0)>		#map0 = affine_map<(d0) -> (d0)>
func.func @fold_fill_generic_basic(%arg0: tensor<?xf32>) -> (tensor<?xf32>) {		func.func @fold_fill_generic_basic(%arg0: tensor<?xf32>) -> (tensor<?xf32>) {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 7.0 : f32		%cst = arith.constant 7.0 : f32
%0 = tensor.dim %arg0, %c0 : tensor<?xf32>		%0 = tensor.dim %arg0, %c0 : tensor<?xf32>
%1 = tensor.empty(%0) : tensor<?xf32>		%1 = tensor.empty(%0) : tensor<?xf32>
%2 = linalg.fill ins(%cst : f32) outs(%1 : tensor<?xf32>) -> tensor<?xf32>		%2 = linalg.fill ins(%cst : f32) outs(%1 : tensor<?xf32>)
%3 = tensor.empty(%0) : tensor<?xf32>		%3 = tensor.empty(%0) : tensor<?xf32>
%4 = linalg.generic {indexing_maps = [#map0, #map0, #map0], iterator_types=["parallel"]} ins(%arg0, %2 : tensor<?xf32>, tensor<?xf32>) outs (%3:tensor<?xf32>) {		%4 = linalg.generic {indexing_maps = [#map0, #map0, #map0], iterator_types=["parallel"]} ins(%arg0, %2 : tensor<?xf32>, tensor<?xf32>) outs (%3:tensor<?xf32>) {
^bb0(%arg1: f32, %arg2: f32, %arg3: f32):		^bb0(%arg1: f32, %arg2: f32, %arg3: f32):
%5 = arith.addf %arg1, %arg2 : f32		%5 = arith.addf %arg1, %arg2 : f32
linalg.yield %5 : f32		linalg.yield %5 : f32
} -> tensor<?xf32>		} -> tensor<?xf32>
return %4 : tensor<?xf32>		return %4 : tensor<?xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @fold_fill_generic_different_dtype		// CHECK-LABEL: func @fold_fill_generic_different_dtype
// CHECK-SAME: (%[[ARG0:.*]]: tensor<?xf16>) -> tensor<?xf16> {		// CHECK-SAME: (%[[ARG0:.*]]: tensor<?xf16>) -> tensor<?xf16> {
// CHECK-NOT: linalg.fill		// CHECK-NOT: linalg.fill
// CHECK: %[[GENERIC_OP:.*]] = linalg.generic		// CHECK: %[[GENERIC_OP:.*]] = linalg.generic
// CHECK-SAME: ins(%[[ARG0]] : tensor<?xf16>)		// CHECK-SAME: ins(%[[ARG0]] : tensor<?xf16>)
// CHECK-SAME: outs({{.*}} : tensor<?xf16>) {		// CHECK-SAME: outs({{.*}} : tensor<?xf16>) {
#map0 = affine_map<(d0) -> (d0)>		#map0 = affine_map<(d0) -> (d0)>
func.func @fold_fill_generic_different_dtype(%arg0: tensor<?xf16>) -> (tensor<?xf16>) {		func.func @fold_fill_generic_different_dtype(%arg0: tensor<?xf16>) -> (tensor<?xf16>) {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 7.0 : f32		%cst = arith.constant 7.0 : f32
%0 = tensor.dim %arg0, %c0 : tensor<?xf16>		%0 = tensor.dim %arg0, %c0 : tensor<?xf16>
%1 = tensor.empty(%0) : tensor<?xf16>		%1 = tensor.empty(%0) : tensor<?xf16>
%2 = linalg.fill ins(%cst : f32) outs(%1 : tensor<?xf16>) -> tensor<?xf16>		%2 = linalg.fill ins(%cst : f32) outs(%1 : tensor<?xf16>)
%3 = tensor.empty(%0) : tensor<?xf16>		%3 = tensor.empty(%0) : tensor<?xf16>
%4 = linalg.generic {indexing_maps = [#map0, #map0, #map0], iterator_types=["parallel"]} ins(%arg0, %2 : tensor<?xf16>, tensor<?xf16>) outs (%3:tensor<?xf16>) {		%4 = linalg.generic {indexing_maps = [#map0, #map0, #map0], iterator_types=["parallel"]} ins(%arg0, %2 : tensor<?xf16>, tensor<?xf16>) outs (%3:tensor<?xf16>) {
^bb0(%arg1: f16, %arg2: f16, %arg3: f16):		^bb0(%arg1: f16, %arg2: f16, %arg3: f16):
%5 = arith.addf %arg1, %arg2 : f16		%5 = arith.addf %arg1, %arg2 : f16
linalg.yield %5 : f16		linalg.yield %5 : f16
} -> tensor<?xf16>		} -> tensor<?xf16>
return %4 : tensor<?xf16>		return %4 : tensor<?xf16>
}		}
Show All 10 Lines
func.func @fold_fill_generic_mixedaccess(%arg0: tensor<?x?xf32>) -> (tensor<?x?xf32>) {		func.func @fold_fill_generic_mixedaccess(%arg0: tensor<?x?xf32>) -> (tensor<?x?xf32>) {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%c1 = arith.constant 0 : index		%c1 = arith.constant 0 : index
%cst1 = arith.constant 7.0 : f32		%cst1 = arith.constant 7.0 : f32
%cst2 = arith.constant 6.0 : f32		%cst2 = arith.constant 6.0 : f32
%0 = tensor.dim %arg0, %c0 : tensor<?x?xf32>		%0 = tensor.dim %arg0, %c0 : tensor<?x?xf32>
%1 = tensor.dim %arg0, %c1 : tensor<?x?xf32>		%1 = tensor.dim %arg0, %c1 : tensor<?x?xf32>
%2 = tensor.empty(%0, %1) : tensor<?x?xf32>		%2 = tensor.empty(%0, %1) : tensor<?x?xf32>
%3 = linalg.fill ins(%cst1 : f32) outs(%2 : tensor<?x?xf32>) -> tensor<?x?xf32>		%3 = linalg.fill ins(%cst1 : f32) outs(%2 : tensor<?x?xf32>)
%4 = tensor.empty(%1, %0) : tensor<?x?xf32>		%4 = tensor.empty(%1, %0) : tensor<?x?xf32>
%5 = linalg.fill ins(%cst2 : f32) outs(%4 : tensor<?x?xf32>) -> tensor<?x?xf32>		%5 = linalg.fill ins(%cst2 : f32) outs(%4 : tensor<?x?xf32>)
%6 = tensor.empty(%0, %1) : tensor<?x?xf32>		%6 = tensor.empty(%0, %1) : tensor<?x?xf32>
%7 = linalg.generic {indexing_maps = [#map0, #map1, #map0], iterator_types=["parallel","parallel"]} ins(%3, %5 : tensor<?x?xf32>, tensor<?x?xf32>) outs (%6:tensor<?x?xf32>) {		%7 = linalg.generic {indexing_maps = [#map0, #map1, #map0], iterator_types=["parallel","parallel"]} ins(%3, %5 : tensor<?x?xf32>, tensor<?x?xf32>) outs (%6:tensor<?x?xf32>) {
^bb0(%arg1: f32, %arg2: f32, %arg3: f32):		^bb0(%arg1: f32, %arg2: f32, %arg3: f32):
%8 = arith.divf %arg1, %arg2 : f32		%8 = arith.divf %arg1, %arg2 : f32
linalg.yield %8 : f32		linalg.yield %8 : f32
} -> tensor<?x?xf32>		} -> tensor<?x?xf32>
return %7 : tensor<?x?xf32>		return %7 : tensor<?x?xf32>
}		}
▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/generalize-named-ops.mlir

	Show All 25 Lines
	// CHECK: %[[MUL:.+]] = arith.mulf %[[A_ARG]], %[[B_ARG]] : f32			// CHECK: %[[MUL:.+]] = arith.mulf %[[A_ARG]], %[[B_ARG]] : f32
	// CHECK: %[[ADD:.+]] = arith.addf %[[C_ARG]], %[[MUL]] : f32			// CHECK: %[[ADD:.+]] = arith.addf %[[C_ARG]], %[[MUL]] : f32
	// CHECK: linalg.yield %[[ADD]] : f32			// CHECK: linalg.yield %[[ADD]] : f32

	// -----			// -----

	func.func @generalize_matmul_tensor(%A : tensor<16x8xf32>, %B: tensor<8x32xf32>, %C: tensor<16x32xf32>) -> tensor<16x32xf32> {			func.func @generalize_matmul_tensor(%A : tensor<16x8xf32>, %B: tensor<8x32xf32>, %C: tensor<16x32xf32>) -> tensor<16x32xf32> {
	%0 = linalg.matmul ins(%A, %B: tensor<16x8xf32>, tensor<8x32xf32>)			%0 = linalg.matmul ins(%A, %B: tensor<16x8xf32>, tensor<8x32xf32>)
	outs(%C: tensor<16x32xf32>) -> tensor<16x32xf32>			outs(%C: tensor<16x32xf32>)
	return %0: tensor<16x32xf32>			return %0: tensor<16x32xf32>
	}			}

	// CHECK: func @generalize_matmul_tensor			// CHECK: func @generalize_matmul_tensor

	// CHECK: linalg.generic			// CHECK: linalg.generic
	// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<16x8xf32>, tensor<8x32xf32>)			// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<16x8xf32>, tensor<8x32xf32>)
	// CHECK-SAME: outs(%{{.+}} : tensor<16x32xf32>)			// CHECK-SAME: outs(%{{.+}} : tensor<16x32xf32>)

	// CHECK: ^{{.*}}(%[[A_ARG:.+]]: f32, %[[B_ARG:.+]]: f32, %[[C_ARG:.+]]: f32)			// CHECK: ^{{.*}}(%[[A_ARG:.+]]: f32, %[[B_ARG:.+]]: f32, %[[C_ARG:.+]]: f32)
	// CHECK-NEXT: %[[MUL:.+]] = arith.mulf %[[A_ARG]], %[[B_ARG]] : f32			// CHECK-NEXT: %[[MUL:.+]] = arith.mulf %[[A_ARG]], %[[B_ARG]] : f32
	// CHECK-NEXT: %[[ADD:.+]] = arith.addf %[[C_ARG]], %[[MUL]] : f32			// CHECK-NEXT: %[[ADD:.+]] = arith.addf %[[C_ARG]], %[[MUL]] : f32
	// CHECK-NEXT: linalg.yield %[[ADD]] : f32			// CHECK-NEXT: linalg.yield %[[ADD]] : f32
	// CHECK-NEXT: -> tensor<16x32xf32>			// CHECK-NEXT: -> tensor<16x32xf32>

	// -----			// -----

	func.func @generalize_matmul_tensor_complex(%A : tensor<16x8xcomplex<f32>>,			func.func @generalize_matmul_tensor_complex(%A : tensor<16x8xcomplex<f32>>,
	%B: tensor<8x32xcomplex<f32>>,			%B: tensor<8x32xcomplex<f32>>,
	%C: tensor<16x32xcomplex<f32>>)			%C: tensor<16x32xcomplex<f32>>)
	-> tensor<16x32xcomplex<f32>> {			-> tensor<16x32xcomplex<f32>> {
	%0 = linalg.matmul ins(%A, %B: tensor<16x8xcomplex<f32>>, tensor<8x32xcomplex<f32>>)			%0 = linalg.matmul ins(%A, %B: tensor<16x8xcomplex<f32>>, tensor<8x32xcomplex<f32>>)
	outs(%C: tensor<16x32xcomplex<f32>>) -> tensor<16x32xcomplex<f32>>			outs(%C: tensor<16x32xcomplex<f32>>)
	return %0: tensor<16x32xcomplex<f32>>			return %0: tensor<16x32xcomplex<f32>>
	}			}

	// CHECK: func @generalize_matmul_tensor_complex			// CHECK: func @generalize_matmul_tensor_complex

	// CHECK: linalg.generic			// CHECK: linalg.generic
	// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<16x8xcomplex<f32>>, tensor<8x32xcomplex<f32>>)			// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<16x8xcomplex<f32>>, tensor<8x32xcomplex<f32>>)
	// CHECK-SAME: outs(%{{.+}} : tensor<16x32xcomplex<f32>>)			// CHECK-SAME: outs(%{{.+}} : tensor<16x32xcomplex<f32>>)
	▲ Show 20 Lines • Show All 209 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/generalize-named-polymorphic-ops.mlir

	// RUN: mlir-opt %s -split-input-file -linalg-generalize-named-ops \| FileCheck %s			// RUN: mlir-opt %s -split-input-file -linalg-generalize-named-ops \| FileCheck %s

	// Verifies that different argument types is legal.			// Verifies that different argument types is legal.
	func.func @generalize_matmul_tensor_f16f64f32(%A : tensor<16x8xf16>, %B: tensor<8x32xf64>, %C: tensor<16x32xf32>) -> tensor<16x32xf32> {			func.func @generalize_matmul_tensor_f16f64f32(%A : tensor<16x8xf16>, %B: tensor<8x32xf64>, %C: tensor<16x32xf32>) -> tensor<16x32xf32> {
	%0 = linalg.matmul ins(%A, %B: tensor<16x8xf16>, tensor<8x32xf64>)			%0 = linalg.matmul ins(%A, %B: tensor<16x8xf16>, tensor<8x32xf64>)
	outs(%C: tensor<16x32xf32>) -> tensor<16x32xf32>			outs(%C: tensor<16x32xf32>)
	return %0: tensor<16x32xf32>			return %0: tensor<16x32xf32>
	}			}

	// CHECK-LABEL: @generalize_matmul_tensor_f16f64f32			// CHECK-LABEL: @generalize_matmul_tensor_f16f64f32
	// CHECK: ^{{.*}}(%[[A_ARG:.+]]: f16, %[[B_ARG:.+]]: f64, %[[C_ARG:.+]]: f32)			// CHECK: ^{{.*}}(%[[A_ARG:.+]]: f16, %[[B_ARG:.+]]: f64, %[[C_ARG:.+]]: f32)
	// Verify floating point extension and truncation.			// Verify floating point extension and truncation.
	// CHECK-NEXT: %[[A_CAST:.+]] = arith.extf %[[A_ARG]] : f16 to f32			// CHECK-NEXT: %[[A_CAST:.+]] = arith.extf %[[A_ARG]] : f16 to f32
	// CHECK-NEXT: %[[B_CAST:.+]] = arith.truncf %[[B_ARG]] : f64 to f32			// CHECK-NEXT: %[[B_CAST:.+]] = arith.truncf %[[B_ARG]] : f64 to f32
	// CHECK-NEXT: %[[MUL:.+]] = arith.mulf %[[A_CAST]], %[[B_CAST]] : f32			// CHECK-NEXT: %[[MUL:.+]] = arith.mulf %[[A_CAST]], %[[B_CAST]] : f32
	// CHECK-NEXT: %[[ADD:.+]] = arith.addf %[[C_ARG]], %[[MUL]] : f32			// CHECK-NEXT: %[[ADD:.+]] = arith.addf %[[C_ARG]], %[[MUL]] : f32
	// CHECK-NEXT: linalg.yield %[[ADD]] : f32			// CHECK-NEXT: linalg.yield %[[ADD]] : f32
	// CHECK-NEXT: -> tensor<16x32xf32>			// CHECK-NEXT: -> tensor<16x32xf32>

	// -----			// -----

	// Verifies that different argument types is legal.			// Verifies that different argument types is legal.
	func.func @generalize_matmul_tensor_i16i64i32(%A : tensor<16x8xi16>, %B: tensor<8x32xi64>, %C: tensor<16x32xi32>) -> tensor<16x32xi32> {			func.func @generalize_matmul_tensor_i16i64i32(%A : tensor<16x8xi16>, %B: tensor<8x32xi64>, %C: tensor<16x32xi32>) -> tensor<16x32xi32> {
	%0 = linalg.matmul ins(%A, %B: tensor<16x8xi16>, tensor<8x32xi64>)			%0 = linalg.matmul ins(%A, %B: tensor<16x8xi16>, tensor<8x32xi64>)
	outs(%C: tensor<16x32xi32>) -> tensor<16x32xi32>			outs(%C: tensor<16x32xi32>)
	return %0: tensor<16x32xi32>			return %0: tensor<16x32xi32>
	}			}

	// CHECK-LABEL: @generalize_matmul_tensor_i16i64i32			// CHECK-LABEL: @generalize_matmul_tensor_i16i64i32
	// CHECK: ^{{.*}}(%[[A_ARG:.+]]: i16, %[[B_ARG:.+]]: i64, %[[C_ARG:.+]]: i32)			// CHECK: ^{{.*}}(%[[A_ARG:.+]]: i16, %[[B_ARG:.+]]: i64, %[[C_ARG:.+]]: i32)
	// Verify signed integer extension and truncation.			// Verify signed integer extension and truncation.
	// CHECK-NEXT: %[[A_CAST:.+]] = arith.extsi %[[A_ARG]] : i16 to i32			// CHECK-NEXT: %[[A_CAST:.+]] = arith.extsi %[[A_ARG]] : i16 to i32
	// CHECK-NEXT: %[[B_CAST:.+]] = arith.trunci %[[B_ARG]] : i64 to i32			// CHECK-NEXT: %[[B_CAST:.+]] = arith.trunci %[[B_ARG]] : i64 to i32
	// CHECK-NEXT: %[[MUL:.+]] = arith.muli %[[A_CAST]], %[[B_CAST]] : i32			// CHECK-NEXT: %[[MUL:.+]] = arith.muli %[[A_CAST]], %[[B_CAST]] : i32
	// CHECK-NEXT: %[[ADD:.+]] = arith.addi %[[C_ARG]], %[[MUL]] : i32			// CHECK-NEXT: %[[ADD:.+]] = arith.addi %[[C_ARG]], %[[MUL]] : i32
	// CHECK-NEXT: linalg.yield %[[ADD]] : i32			// CHECK-NEXT: linalg.yield %[[ADD]] : i32
	// CHECK-NEXT: -> tensor<16x32xi32>			// CHECK-NEXT: -> tensor<16x32xi32>


	// -----			// -----

	// Verifies that cast attributes control the cast operations used.			// Verifies that cast attributes control the cast operations used.
	func.func @generalize_matmul_tensor_i16i64i32_unsigned(%A : tensor<16x8xi16>, %B: tensor<8x32xi64>, %C: tensor<16x32xi32>) -> tensor<16x32xi32> {			func.func @generalize_matmul_tensor_i16i64i32_unsigned(%A : tensor<16x8xi16>, %B: tensor<8x32xi64>, %C: tensor<16x32xi32>) -> tensor<16x32xi32> {
	%0 = linalg.matmul {cast = #linalg.type_fn<cast_unsigned>}			%0 = linalg.matmul {cast = #linalg.type_fn<cast_unsigned>}
	ins(%A, %B: tensor<16x8xi16>, tensor<8x32xi64>)			ins(%A, %B: tensor<16x8xi16>, tensor<8x32xi64>)
	outs(%C: tensor<16x32xi32>) -> tensor<16x32xi32>			outs(%C: tensor<16x32xi32>)
	return %0: tensor<16x32xi32>			return %0: tensor<16x32xi32>
	}			}

	// CHECK-LABEL: @generalize_matmul_tensor_i16i64i32_unsigned			// CHECK-LABEL: @generalize_matmul_tensor_i16i64i32_unsigned
	// CHECK: = arith.extui			// CHECK: = arith.extui

	// -----			// -----

	func.func @generalize_matmul_tensor_i16i64f32(%A : tensor<16x8xi16>, %B: tensor<8x32xi64>, %C: tensor<16x32xf32>) -> tensor<16x32xf32> {			func.func @generalize_matmul_tensor_i16i64f32(%A : tensor<16x8xi16>, %B: tensor<8x32xi64>, %C: tensor<16x32xf32>) -> tensor<16x32xf32> {
	%0 = linalg.matmul ins(%A, %B: tensor<16x8xi16>, tensor<8x32xi64>)			%0 = linalg.matmul ins(%A, %B: tensor<16x8xi16>, tensor<8x32xi64>)
	outs(%C: tensor<16x32xf32>) -> tensor<16x32xf32>			outs(%C: tensor<16x32xf32>)
	return %0: tensor<16x32xf32>			return %0: tensor<16x32xf32>
	}			}

	// CHECK-LABEL: @generalize_matmul_tensor_i16i64f32			// CHECK-LABEL: @generalize_matmul_tensor_i16i64f32
	// Verify signed integer to floating point cast.			// Verify signed integer to floating point cast.
	// CHECK: = arith.sitofp			// CHECK: = arith.sitofp
	// CHECK: = arith.sitofp			// CHECK: = arith.sitofp

	// -----			// -----

	func.func @generalize_matmul_tensor_f16f64i32(%A : tensor<16x8xf16>, %B: tensor<8x32xf64>, %C: tensor<16x32xi32>) -> tensor<16x32xi32> {			func.func @generalize_matmul_tensor_f16f64i32(%A : tensor<16x8xf16>, %B: tensor<8x32xf64>, %C: tensor<16x32xi32>) -> tensor<16x32xi32> {
	%0 = linalg.matmul ins(%A, %B: tensor<16x8xf16>, tensor<8x32xf64>)			%0 = linalg.matmul ins(%A, %B: tensor<16x8xf16>, tensor<8x32xf64>)
	outs(%C: tensor<16x32xi32>) -> tensor<16x32xi32>			outs(%C: tensor<16x32xi32>)
	return %0: tensor<16x32xi32>			return %0: tensor<16x32xi32>
	}			}

	// CHECK-LABEL: @generalize_matmul_tensor_f16f64i32			// CHECK-LABEL: @generalize_matmul_tensor_f16f64i32
	// Verify floating point to signed integer cast.			// Verify floating point to signed integer cast.
	// CHECK: = arith.fptosi			// CHECK: = arith.fptosi
	// CHECK: = arith.fptosi			// CHECK: = arith.fptosi

	// -----			// -----

	func.func @generalize_matmul_unsigned_tensor_i16i64i32(%A : tensor<16x8xi16>, %B: tensor<8x32xi64>, %C: tensor<16x32xi32>) -> tensor<16x32xi32> {			func.func @generalize_matmul_unsigned_tensor_i16i64i32(%A : tensor<16x8xi16>, %B: tensor<8x32xi64>, %C: tensor<16x32xi32>) -> tensor<16x32xi32> {
	%0 = linalg.matmul_unsigned ins(%A, %B: tensor<16x8xi16>, tensor<8x32xi64>)			%0 = linalg.matmul_unsigned ins(%A, %B: tensor<16x8xi16>, tensor<8x32xi64>)
	outs(%C: tensor<16x32xi32>) -> tensor<16x32xi32>			outs(%C: tensor<16x32xi32>)
	return %0: tensor<16x32xi32>			return %0: tensor<16x32xi32>
	}			}

	// CHECK-LABEL: @generalize_matmul_unsigned_tensor_i16i64i32			// CHECK-LABEL: @generalize_matmul_unsigned_tensor_i16i64i32
	// Verify unsigned integer extension and truncation.			// Verify unsigned integer extension and truncation.
	// CHECK: = arith.extui			// CHECK: = arith.extui
	// CHECK: = arith.trunci			// CHECK: = arith.trunci

	// -----			// -----

	func.func @generalize_matmul_unsigned_tensor_i16i64f32(%A : tensor<16x8xi16>, %B: tensor<8x32xi64>, %C: tensor<16x32xf32>) -> tensor<16x32xf32> {			func.func @generalize_matmul_unsigned_tensor_i16i64f32(%A : tensor<16x8xi16>, %B: tensor<8x32xi64>, %C: tensor<16x32xf32>) -> tensor<16x32xf32> {
	%0 = linalg.matmul_unsigned ins(%A, %B: tensor<16x8xi16>, tensor<8x32xi64>)			%0 = linalg.matmul_unsigned ins(%A, %B: tensor<16x8xi16>, tensor<8x32xi64>)
	outs(%C: tensor<16x32xf32>) -> tensor<16x32xf32>			outs(%C: tensor<16x32xf32>)
	return %0: tensor<16x32xf32>			return %0: tensor<16x32xf32>
	}			}

	// CHECK-LABEL: @generalize_matmul_unsigned_tensor_i16i64f32			// CHECK-LABEL: @generalize_matmul_unsigned_tensor_i16i64f32
	// Verify unsigned integer to floating point cast.			// Verify unsigned integer to floating point cast.
	// CHECK: = arith.uitofp			// CHECK: = arith.uitofp
	// CHECK: = arith.uitofp			// CHECK: = arith.uitofp

	// -----			// -----

	func.func @generalize_matmul_unsigned_tensor_f16f64i32(%A : tensor<16x8xf16>, %B: tensor<8x32xf64>, %C: tensor<16x32xi32>) -> tensor<16x32xi32> {			func.func @generalize_matmul_unsigned_tensor_f16f64i32(%A : tensor<16x8xf16>, %B: tensor<8x32xf64>, %C: tensor<16x32xi32>) -> tensor<16x32xi32> {
	%0 = linalg.matmul_unsigned ins(%A, %B: tensor<16x8xf16>, tensor<8x32xf64>)			%0 = linalg.matmul_unsigned ins(%A, %B: tensor<16x8xf16>, tensor<8x32xf64>)
	outs(%C: tensor<16x32xi32>) -> tensor<16x32xi32>			outs(%C: tensor<16x32xi32>)
	return %0: tensor<16x32xi32>			return %0: tensor<16x32xi32>
	}			}

	// CHECK-LABEL: @generalize_matmul_unsigned_tensor_f16f64i32			// CHECK-LABEL: @generalize_matmul_unsigned_tensor_f16f64i32
	// Verify floating point to unsigend integer cast.			// Verify floating point to unsigend integer cast.
	// CHECK: = arith.fptoui			// CHECK: = arith.fptoui
	// CHECK: = arith.fptoui			// CHECK: = arith.fptoui

	// -----			// -----

	func.func @generalize_pooling_nhwc_max_f32(%input : tensor<1x4x16x1xf32>, %shape: tensor<2x2xf32>, %output: tensor<1x2x4x1xf32>) -> tensor<1x2x4x1xf32> {			func.func @generalize_pooling_nhwc_max_f32(%input : tensor<1x4x16x1xf32>, %shape: tensor<2x2xf32>, %output: tensor<1x2x4x1xf32>) -> tensor<1x2x4x1xf32> {
	%0 = linalg.pooling_nhwc_max {dilations = dense<[1, 2]> : tensor<2xi64>, strides = dense<[2, 4]> : tensor<2xi64>}			%0 = linalg.pooling_nhwc_max {dilations = dense<[1, 2]> : tensor<2xi64>, strides = dense<[2, 4]> : tensor<2xi64>}
	ins(%input, %shape : tensor<1x4x16x1xf32>, tensor<2x2xf32>) outs(%output : tensor<1x2x4x1xf32>) -> tensor<1x2x4x1xf32>			ins(%input, %shape : tensor<1x4x16x1xf32>, tensor<2x2xf32>) outs(%output : tensor<1x2x4x1xf32>)
	return %0: tensor<1x2x4x1xf32>			return %0: tensor<1x2x4x1xf32>
	}			}

	// CHECK-LABEL: @generalize_pooling_nhwc_max_f32			// CHECK-LABEL: @generalize_pooling_nhwc_max_f32
	// CHECK: ^{{.*}}(%[[IN_ARG:.+]]: f32, %[[SHAPE_ARG:.+]]: f32, %[[OUT_ARG:.+]]: f32)			// CHECK: ^{{.*}}(%[[IN_ARG:.+]]: f32, %[[SHAPE_ARG:.+]]: f32, %[[OUT_ARG:.+]]: f32)
	// CHECK-NEXT: %[[MAX:.+]] = arith.maxf %[[OUT_ARG]], %[[IN_ARG]] : f32			// CHECK-NEXT: %[[MAX:.+]] = arith.maxf %[[OUT_ARG]], %[[IN_ARG]] : f32
	// CHECK-NEXT: linalg.yield %[[MAX]] : f32			// CHECK-NEXT: linalg.yield %[[MAX]] : f32
	// CHECK-NEXT: -> tensor<1x2x4x1xf32>			// CHECK-NEXT: -> tensor<1x2x4x1xf32>

	// -----			// -----

	func.func @generalize_pooling_nwc_max_f32(%input : tensor<1x16x1xf32>, %shape: tensor<2xf32>, %output: tensor<1x4x1xf32>) -> tensor<1x4x1xf32> {			func.func @generalize_pooling_nwc_max_f32(%input : tensor<1x16x1xf32>, %shape: tensor<2xf32>, %output: tensor<1x4x1xf32>) -> tensor<1x4x1xf32> {
	%0 = linalg.pooling_nwc_max {dilations = dense<[2]> : tensor<1xi64>, strides = dense<[4]> : tensor<1xi64>}			%0 = linalg.pooling_nwc_max {dilations = dense<[2]> : tensor<1xi64>, strides = dense<[4]> : tensor<1xi64>}
	ins(%input, %shape : tensor<1x16x1xf32>, tensor<2xf32>) outs(%output : tensor<1x4x1xf32>) -> tensor<1x4x1xf32>			ins(%input, %shape : tensor<1x16x1xf32>, tensor<2xf32>) outs(%output : tensor<1x4x1xf32>)
	return %0: tensor<1x4x1xf32>			return %0: tensor<1x4x1xf32>
	}			}

	// CHECK-LABEL: @generalize_pooling_nwc_max_f32			// CHECK-LABEL: @generalize_pooling_nwc_max_f32
	// CHECK: ^{{.*}}(%[[IN_ARG:.+]]: f32, %[[SHAPE_ARG:.+]]: f32, %[[OUT_ARG:.+]]: f32)			// CHECK: ^{{.*}}(%[[IN_ARG:.+]]: f32, %[[SHAPE_ARG:.+]]: f32, %[[OUT_ARG:.+]]: f32)
	// CHECK-NEXT: %[[MAX:.+]] = arith.maxf %[[OUT_ARG]], %[[IN_ARG]] : f32			// CHECK-NEXT: %[[MAX:.+]] = arith.maxf %[[OUT_ARG]], %[[IN_ARG]] : f32
	// CHECK-NEXT: linalg.yield %[[MAX]] : f32			// CHECK-NEXT: linalg.yield %[[MAX]] : f32
	// CHECK-NEXT: -> tensor<1x4x1xf32>			// CHECK-NEXT: -> tensor<1x4x1xf32>

	// -----			// -----

	func.func @generalize_pooling_nhwc_max_i32(%input : tensor<1x4x16x1xi32>, %shape: tensor<2x2xi32>, %output: tensor<1x2x4x1xi32>) -> tensor<1x2x4x1xi32> {			func.func @generalize_pooling_nhwc_max_i32(%input : tensor<1x4x16x1xi32>, %shape: tensor<2x2xi32>, %output: tensor<1x2x4x1xi32>) -> tensor<1x2x4x1xi32> {
	%0 = linalg.pooling_nhwc_max {dilations = dense<[1, 2]> : tensor<2xi64>, strides = dense<[2, 4]> : tensor<2xi64>}			%0 = linalg.pooling_nhwc_max {dilations = dense<[1, 2]> : tensor<2xi64>, strides = dense<[2, 4]> : tensor<2xi64>}
	ins(%input, %shape : tensor<1x4x16x1xi32>, tensor<2x2xi32>) outs(%output : tensor<1x2x4x1xi32>) -> tensor<1x2x4x1xi32>			ins(%input, %shape : tensor<1x4x16x1xi32>, tensor<2x2xi32>) outs(%output : tensor<1x2x4x1xi32>)
	return %0: tensor<1x2x4x1xi32>			return %0: tensor<1x2x4x1xi32>
	}			}

	// CHECK-LABEL: @generalize_pooling_nhwc_max_i32			// CHECK-LABEL: @generalize_pooling_nhwc_max_i32
	// Verify signed integer maximum.			// Verify signed integer maximum.
	// CHECK: = arith.maxsi			// CHECK: = arith.maxsi

	// -----			// -----

	func.func @generalize_pooling_nwc_max_i32(%input : tensor<1x16x1xi32>, %shape: tensor<2xi32>, %output: tensor<1x4x1xi32>) -> tensor<1x4x1xi32> {			func.func @generalize_pooling_nwc_max_i32(%input : tensor<1x16x1xi32>, %shape: tensor<2xi32>, %output: tensor<1x4x1xi32>) -> tensor<1x4x1xi32> {
	%0 = linalg.pooling_nwc_max {dilations = dense<[2]> : tensor<1xi64>, strides = dense<[4]> : tensor<1xi64>}			%0 = linalg.pooling_nwc_max {dilations = dense<[2]> : tensor<1xi64>, strides = dense<[4]> : tensor<1xi64>}
	ins(%input, %shape : tensor<1x16x1xi32>, tensor<2xi32>) outs(%output : tensor<1x4x1xi32>) -> tensor<1x4x1xi32>			ins(%input, %shape : tensor<1x16x1xi32>, tensor<2xi32>) outs(%output : tensor<1x4x1xi32>)
	return %0: tensor<1x4x1xi32>			return %0: tensor<1x4x1xi32>
	}			}

	// CHECK-LABEL: @generalize_pooling_nwc_max_i32			// CHECK-LABEL: @generalize_pooling_nwc_max_i32
	// Verify signed integer maximum.			// Verify signed integer maximum.
	// CHECK: = arith.maxsi			// CHECK: = arith.maxsi

	// -----			// -----

	func.func @generalize_pooling_nhwc_max_unsigned_i32(%input : tensor<1x4x16x1xi32>, %shape: tensor<2x2xi32>, %output: tensor<1x2x4x1xi32>) -> tensor<1x2x4x1xi32> {			func.func @generalize_pooling_nhwc_max_unsigned_i32(%input : tensor<1x4x16x1xi32>, %shape: tensor<2x2xi32>, %output: tensor<1x2x4x1xi32>) -> tensor<1x2x4x1xi32> {
	%0 = linalg.pooling_nhwc_max_unsigned {dilations = dense<[1, 2]> : tensor<2xi64>, strides = dense<[2, 4]> : tensor<2xi64>}			%0 = linalg.pooling_nhwc_max_unsigned {dilations = dense<[1, 2]> : tensor<2xi64>, strides = dense<[2, 4]> : tensor<2xi64>}
	ins(%input, %shape : tensor<1x4x16x1xi32>, tensor<2x2xi32>) outs(%output : tensor<1x2x4x1xi32>) -> tensor<1x2x4x1xi32>			ins(%input, %shape : tensor<1x4x16x1xi32>, tensor<2x2xi32>) outs(%output : tensor<1x2x4x1xi32>)
	return %0: tensor<1x2x4x1xi32>			return %0: tensor<1x2x4x1xi32>
	}			}

	// CHECK-LABEL: @generalize_pooling_nhwc_max_unsigned_i32			// CHECK-LABEL: @generalize_pooling_nhwc_max_unsigned_i32
	// Verify unsigned integer minimum.			// Verify unsigned integer minimum.
	// CHECK: = arith.maxui			// CHECK: = arith.maxui

	// -----			// -----

	func.func @generalize_pooling_nwc_max_unsigned_i32(%input : tensor<1x16x1xi32>, %shape: tensor<2xi32>, %output: tensor<1x4x1xi32>) -> tensor<1x4x1xi32> {			func.func @generalize_pooling_nwc_max_unsigned_i32(%input : tensor<1x16x1xi32>, %shape: tensor<2xi32>, %output: tensor<1x4x1xi32>) -> tensor<1x4x1xi32> {
	%0 = linalg.pooling_nwc_max_unsigned {dilations = dense<[2]> : tensor<1xi64>, strides = dense<[4]> : tensor<1xi64>}			%0 = linalg.pooling_nwc_max_unsigned {dilations = dense<[2]> : tensor<1xi64>, strides = dense<[4]> : tensor<1xi64>}
	ins(%input, %shape : tensor<1x16x1xi32>, tensor<2xi32>) outs(%output : tensor<1x4x1xi32>) -> tensor<1x4x1xi32>			ins(%input, %shape : tensor<1x16x1xi32>, tensor<2xi32>) outs(%output : tensor<1x4x1xi32>)
	return %0: tensor<1x4x1xi32>			return %0: tensor<1x4x1xi32>
	}			}

	// CHECK-LABEL: @generalize_pooling_nwc_max_unsigned_i32			// CHECK-LABEL: @generalize_pooling_nwc_max_unsigned_i32
	// Verify unsigned integer minimum.			// Verify unsigned integer minimum.
	// CHECK: = arith.maxui			// CHECK: = arith.maxui

	// -----			// -----

	func.func @generalize_pooling_nhwc_min_f32(%input : tensor<1x4x16x1xf32>, %shape: tensor<2x2xf32>, %output: tensor<1x2x4x1xf32>) -> tensor<1x2x4x1xf32> {			func.func @generalize_pooling_nhwc_min_f32(%input : tensor<1x4x16x1xf32>, %shape: tensor<2x2xf32>, %output: tensor<1x2x4x1xf32>) -> tensor<1x2x4x1xf32> {
	%0 = linalg.pooling_nhwc_min {dilations = dense<[1, 2]> : tensor<2xi64>, strides = dense<[2, 4]> : tensor<2xi64>}			%0 = linalg.pooling_nhwc_min {dilations = dense<[1, 2]> : tensor<2xi64>, strides = dense<[2, 4]> : tensor<2xi64>}
	ins(%input, %shape : tensor<1x4x16x1xf32>, tensor<2x2xf32>) outs(%output : tensor<1x2x4x1xf32>) -> tensor<1x2x4x1xf32>			ins(%input, %shape : tensor<1x4x16x1xf32>, tensor<2x2xf32>) outs(%output : tensor<1x2x4x1xf32>)
	return %0: tensor<1x2x4x1xf32>			return %0: tensor<1x2x4x1xf32>
	}			}

	// CHECK-LABEL: @generalize_pooling_nhwc_min_f32			// CHECK-LABEL: @generalize_pooling_nhwc_min_f32
	// CHECK: ^{{.*}}(%[[IN_ARG:.+]]: f32, %[[SHAPE_ARG:.+]]: f32, %[[OUT_ARG:.+]]: f32)			// CHECK: ^{{.*}}(%[[IN_ARG:.+]]: f32, %[[SHAPE_ARG:.+]]: f32, %[[OUT_ARG:.+]]: f32)
	// CHECK-NEXT: %[[MIN:.+]] = arith.minf %[[OUT_ARG]], %[[IN_ARG]] : f32			// CHECK-NEXT: %[[MIN:.+]] = arith.minf %[[OUT_ARG]], %[[IN_ARG]] : f32
	// CHECK-NEXT: linalg.yield %[[MIN]] : f32			// CHECK-NEXT: linalg.yield %[[MIN]] : f32
	// CHECK-NEXT: -> tensor<1x2x4x1xf32>			// CHECK-NEXT: -> tensor<1x2x4x1xf32>

	// -----			// -----

	func.func @generalize_pooling_nwc_min_f32(%input : tensor<1x16x1xf32>, %shape: tensor<2xf32>, %output: tensor<1x4x1xf32>) -> tensor<1x4x1xf32> {			func.func @generalize_pooling_nwc_min_f32(%input : tensor<1x16x1xf32>, %shape: tensor<2xf32>, %output: tensor<1x4x1xf32>) -> tensor<1x4x1xf32> {
	%0 = linalg.pooling_nwc_min {dilations = dense<[2]> : tensor<1xi64>, strides = dense<[4]> : tensor<1xi64>}			%0 = linalg.pooling_nwc_min {dilations = dense<[2]> : tensor<1xi64>, strides = dense<[4]> : tensor<1xi64>}
	ins(%input, %shape : tensor<1x16x1xf32>, tensor<2xf32>) outs(%output : tensor<1x4x1xf32>) -> tensor<1x4x1xf32>			ins(%input, %shape : tensor<1x16x1xf32>, tensor<2xf32>) outs(%output : tensor<1x4x1xf32>)
	return %0: tensor<1x4x1xf32>			return %0: tensor<1x4x1xf32>
	}			}

	// CHECK-LABEL: @generalize_pooling_nwc_min_f32			// CHECK-LABEL: @generalize_pooling_nwc_min_f32
	// CHECK: ^{{.*}}(%[[IN_ARG:.+]]: f32, %[[SHAPE_ARG:.+]]: f32, %[[OUT_ARG:.+]]: f32)			// CHECK: ^{{.*}}(%[[IN_ARG:.+]]: f32, %[[SHAPE_ARG:.+]]: f32, %[[OUT_ARG:.+]]: f32)
	// CHECK-NEXT: %[[MIN:.+]] = arith.minf %[[OUT_ARG]], %[[IN_ARG]] : f32			// CHECK-NEXT: %[[MIN:.+]] = arith.minf %[[OUT_ARG]], %[[IN_ARG]] : f32
	// CHECK-NEXT: linalg.yield %[[MIN]] : f32			// CHECK-NEXT: linalg.yield %[[MIN]] : f32
	// CHECK-NEXT: -> tensor<1x4x1xf32>			// CHECK-NEXT: -> tensor<1x4x1xf32>

	// -----			// -----

	func.func @generalize_pooling_nhwc_min_i32(%input : tensor<1x4x16x1xi32>, %shape: tensor<2x2xi32>, %output: tensor<1x2x4x1xi32>) -> tensor<1x2x4x1xi32> {			func.func @generalize_pooling_nhwc_min_i32(%input : tensor<1x4x16x1xi32>, %shape: tensor<2x2xi32>, %output: tensor<1x2x4x1xi32>) -> tensor<1x2x4x1xi32> {
	%0 = linalg.pooling_nhwc_min {dilations = dense<[1, 2]> : tensor<2xi64>, strides = dense<[2, 4]> : tensor<2xi64>}			%0 = linalg.pooling_nhwc_min {dilations = dense<[1, 2]> : tensor<2xi64>, strides = dense<[2, 4]> : tensor<2xi64>}
	ins(%input, %shape : tensor<1x4x16x1xi32>, tensor<2x2xi32>) outs(%output : tensor<1x2x4x1xi32>) -> tensor<1x2x4x1xi32>			ins(%input, %shape : tensor<1x4x16x1xi32>, tensor<2x2xi32>) outs(%output : tensor<1x2x4x1xi32>)
	return %0: tensor<1x2x4x1xi32>			return %0: tensor<1x2x4x1xi32>
	}			}

	// CHECK-LABEL: @generalize_pooling_nhwc_min_i32			// CHECK-LABEL: @generalize_pooling_nhwc_min_i32
	// Verify signed integer minimum.			// Verify signed integer minimum.
	// CHECK: = arith.minsi			// CHECK: = arith.minsi

	// -----			// -----

	func.func @generalize_pooling_nwc_min_i32(%input : tensor<1x16x1xi32>, %shape: tensor<2xi32>, %output: tensor<1x4x1xi32>) -> tensor<1x4x1xi32> {			func.func @generalize_pooling_nwc_min_i32(%input : tensor<1x16x1xi32>, %shape: tensor<2xi32>, %output: tensor<1x4x1xi32>) -> tensor<1x4x1xi32> {
	%0 = linalg.pooling_nwc_min {dilations = dense<[2]> : tensor<1xi64>, strides = dense<[4]> : tensor<1xi64>}			%0 = linalg.pooling_nwc_min {dilations = dense<[2]> : tensor<1xi64>, strides = dense<[4]> : tensor<1xi64>}
	ins(%input, %shape : tensor<1x16x1xi32>, tensor<2xi32>) outs(%output : tensor<1x4x1xi32>) -> tensor<1x4x1xi32>			ins(%input, %shape : tensor<1x16x1xi32>, tensor<2xi32>) outs(%output : tensor<1x4x1xi32>)
	return %0: tensor<1x4x1xi32>			return %0: tensor<1x4x1xi32>
	}			}

	// CHECK-LABEL: @generalize_pooling_nwc_min_i32			// CHECK-LABEL: @generalize_pooling_nwc_min_i32
	// Verify signed integer minimum.			// Verify signed integer minimum.
	// CHECK: = arith.minsi			// CHECK: = arith.minsi

	// -----			// -----

	func.func @generalize_pooling_nhwc_min_unsigned_i32(%input : tensor<1x4x16x1xi32>, %shape: tensor<2x2xi32>, %output: tensor<1x2x4x1xi32>) -> tensor<1x2x4x1xi32> {			func.func @generalize_pooling_nhwc_min_unsigned_i32(%input : tensor<1x4x16x1xi32>, %shape: tensor<2x2xi32>, %output: tensor<1x2x4x1xi32>) -> tensor<1x2x4x1xi32> {
	%0 = linalg.pooling_nhwc_min_unsigned {dilations = dense<[1, 2]> : tensor<2xi64>, strides = dense<[2, 4]> : tensor<2xi64>}			%0 = linalg.pooling_nhwc_min_unsigned {dilations = dense<[1, 2]> : tensor<2xi64>, strides = dense<[2, 4]> : tensor<2xi64>}
	ins(%input, %shape : tensor<1x4x16x1xi32>, tensor<2x2xi32>) outs(%output : tensor<1x2x4x1xi32>) -> tensor<1x2x4x1xi32>			ins(%input, %shape : tensor<1x4x16x1xi32>, tensor<2x2xi32>) outs(%output : tensor<1x2x4x1xi32>)
	return %0: tensor<1x2x4x1xi32>			return %0: tensor<1x2x4x1xi32>
	}			}

	// CHECK-LABEL: @generalize_pooling_nhwc_min_unsigned_i32			// CHECK-LABEL: @generalize_pooling_nhwc_min_unsigned_i32
	// Verify unsigned integer minimum.			// Verify unsigned integer minimum.
	// CHECK: = arith.minui			// CHECK: = arith.minui

	// -----			// -----

	func.func @generalize_pooling_nwc_min_unsigned_i32(%input : tensor<1x16x1xi32>, %shape: tensor<2xi32>, %output: tensor<1x4x1xi32>) -> tensor<1x4x1xi32> {			func.func @generalize_pooling_nwc_min_unsigned_i32(%input : tensor<1x16x1xi32>, %shape: tensor<2xi32>, %output: tensor<1x4x1xi32>) -> tensor<1x4x1xi32> {
	%0 = linalg.pooling_nwc_min_unsigned {dilations = dense<[2]> : tensor<1xi64>, strides = dense<[4]> : tensor<1xi64>}			%0 = linalg.pooling_nwc_min_unsigned {dilations = dense<[2]> : tensor<1xi64>, strides = dense<[4]> : tensor<1xi64>}
	ins(%input, %shape : tensor<1x16x1xi32>, tensor<2xi32>) outs(%output : tensor<1x4x1xi32>) -> tensor<1x4x1xi32>			ins(%input, %shape : tensor<1x16x1xi32>, tensor<2xi32>) outs(%output : tensor<1x4x1xi32>)
	return %0: tensor<1x4x1xi32>			return %0: tensor<1x4x1xi32>
	}			}

	// CHECK-LABEL: @generalize_pooling_nwc_min_unsigned_i32			// CHECK-LABEL: @generalize_pooling_nwc_min_unsigned_i32
	// Verify unsigned integer minimum.			// Verify unsigned integer minimum.
	// CHECK: = arith.minui			// CHECK: = arith.minui

	// -----			// -----

	func.func @generalize_pooling_nhwc_sum_f32(%input : tensor<1x4x16x1xf32>, %shape: tensor<2x2xf32>, %output: tensor<1x2x4x1xf32>) -> tensor<1x2x4x1xf32> {			func.func @generalize_pooling_nhwc_sum_f32(%input : tensor<1x4x16x1xf32>, %shape: tensor<2x2xf32>, %output: tensor<1x2x4x1xf32>) -> tensor<1x2x4x1xf32> {
	%0 = linalg.pooling_nhwc_sum {dilations = dense<[1, 2]> : tensor<2xi64>, strides = dense<[2, 4]> : tensor<2xi64>}			%0 = linalg.pooling_nhwc_sum {dilations = dense<[1, 2]> : tensor<2xi64>, strides = dense<[2, 4]> : tensor<2xi64>}
	ins(%input, %shape : tensor<1x4x16x1xf32>, tensor<2x2xf32>) outs(%output : tensor<1x2x4x1xf32>) -> tensor<1x2x4x1xf32>			ins(%input, %shape : tensor<1x4x16x1xf32>, tensor<2x2xf32>) outs(%output : tensor<1x2x4x1xf32>)
	return %0: tensor<1x2x4x1xf32>			return %0: tensor<1x2x4x1xf32>
	}			}

	// CHECK-LABEL: @generalize_pooling_nhwc_sum_f32			// CHECK-LABEL: @generalize_pooling_nhwc_sum_f32
	// CHECK: ^{{.*}}(%[[IN_ARG:.+]]: f32, %[[SHAPE_ARG:.+]]: f32, %[[OUT_ARG:.+]]: f32)			// CHECK: ^{{.*}}(%[[IN_ARG:.+]]: f32, %[[SHAPE_ARG:.+]]: f32, %[[OUT_ARG:.+]]: f32)
	// CHECK-NEXT: %[[ADD:.+]] = arith.addf %[[OUT_ARG]], %[[IN_ARG]] : f32			// CHECK-NEXT: %[[ADD:.+]] = arith.addf %[[OUT_ARG]], %[[IN_ARG]] : f32
	// CHECK-NEXT: linalg.yield %[[ADD]] : f32			// CHECK-NEXT: linalg.yield %[[ADD]] : f32
	// CHECK-NEXT: -> tensor<1x2x4x1xf32>			// CHECK-NEXT: -> tensor<1x2x4x1xf32>

	// -----			// -----

	func.func @generalize_pooling_nwc_sum_f32(%input : tensor<1x16x1xf32>, %shape: tensor<2xf32>, %output: tensor<1x4x1xf32>) -> tensor<1x4x1xf32> {			func.func @generalize_pooling_nwc_sum_f32(%input : tensor<1x16x1xf32>, %shape: tensor<2xf32>, %output: tensor<1x4x1xf32>) -> tensor<1x4x1xf32> {
	%0 = linalg.pooling_nwc_sum {dilations = dense<[2]> : tensor<1xi64>, strides = dense<[4]> : tensor<1xi64>}			%0 = linalg.pooling_nwc_sum {dilations = dense<[2]> : tensor<1xi64>, strides = dense<[4]> : tensor<1xi64>}
	ins(%input, %shape : tensor<1x16x1xf32>, tensor<2xf32>) outs(%output : tensor<1x4x1xf32>) -> tensor<1x4x1xf32>			ins(%input, %shape : tensor<1x16x1xf32>, tensor<2xf32>) outs(%output : tensor<1x4x1xf32>)
	return %0: tensor<1x4x1xf32>			return %0: tensor<1x4x1xf32>
	}			}

	// CHECK-LABEL: @generalize_pooling_nwc_sum_f32			// CHECK-LABEL: @generalize_pooling_nwc_sum_f32
	// CHECK: ^{{.*}}(%[[IN_ARG:.+]]: f32, %[[SHAPE_ARG:.+]]: f32, %[[OUT_ARG:.+]]: f32)			// CHECK: ^{{.*}}(%[[IN_ARG:.+]]: f32, %[[SHAPE_ARG:.+]]: f32, %[[OUT_ARG:.+]]: f32)
	// CHECK-NEXT: %[[ADD:.+]] = arith.addf %[[OUT_ARG]], %[[IN_ARG]] : f32			// CHECK-NEXT: %[[ADD:.+]] = arith.addf %[[OUT_ARG]], %[[IN_ARG]] : f32
	// CHECK-NEXT: linalg.yield %[[ADD]] : f32			// CHECK-NEXT: linalg.yield %[[ADD]] : f32
	// CHECK-NEXT: -> tensor<1x4x1xf32>			// CHECK-NEXT: -> tensor<1x4x1xf32>

	// -----			// -----

	func.func @generalize_pooling_nhwc_sum_i32(%input : tensor<1x4x16x1xi32>, %shape: tensor<2x2xi32>, %output: tensor<1x2x4x1xi32>) -> tensor<1x2x4x1xi32> {			func.func @generalize_pooling_nhwc_sum_i32(%input : tensor<1x4x16x1xi32>, %shape: tensor<2x2xi32>, %output: tensor<1x2x4x1xi32>) -> tensor<1x2x4x1xi32> {
	%0 = linalg.pooling_nhwc_sum {dilations = dense<[1, 2]> : tensor<2xi64>, strides = dense<[2, 4]> : tensor<2xi64>}			%0 = linalg.pooling_nhwc_sum {dilations = dense<[1, 2]> : tensor<2xi64>, strides = dense<[2, 4]> : tensor<2xi64>}
	ins(%input, %shape : tensor<1x4x16x1xi32>, tensor<2x2xi32>) outs(%output : tensor<1x2x4x1xi32>) -> tensor<1x2x4x1xi32>			ins(%input, %shape : tensor<1x4x16x1xi32>, tensor<2x2xi32>) outs(%output : tensor<1x2x4x1xi32>)
	return %0: tensor<1x2x4x1xi32>			return %0: tensor<1x2x4x1xi32>
	}			}

	// CHECK-LABEL: @generalize_pooling_nhwc_sum_i32			// CHECK-LABEL: @generalize_pooling_nhwc_sum_i32
	// CHECK: ^{{.*}}(%[[IN_ARG:.+]]: i32, %[[SHAPE_ARG:.+]]: i32, %[[OUT_ARG:.+]]: i32)			// CHECK: ^{{.*}}(%[[IN_ARG:.+]]: i32, %[[SHAPE_ARG:.+]]: i32, %[[OUT_ARG:.+]]: i32)
	// CHECK-NEXT: %[[ADD:.+]] = arith.addi %[[OUT_ARG]], %[[IN_ARG]] : i32			// CHECK-NEXT: %[[ADD:.+]] = arith.addi %[[OUT_ARG]], %[[IN_ARG]] : i32
	// CHECK-NEXT: linalg.yield %[[ADD]] : i32			// CHECK-NEXT: linalg.yield %[[ADD]] : i32
	// CHECK-NEXT: -> tensor<1x2x4x1xi32>			// CHECK-NEXT: -> tensor<1x2x4x1xi32>

	// -----			// -----

	func.func @generalize_pooling_nwc_sum_i32(%input : tensor<1x16x1xi32>, %shape: tensor<2xi32>, %output: tensor<1x4x1xi32>) -> tensor<1x4x1xi32> {			func.func @generalize_pooling_nwc_sum_i32(%input : tensor<1x16x1xi32>, %shape: tensor<2xi32>, %output: tensor<1x4x1xi32>) -> tensor<1x4x1xi32> {
	%0 = linalg.pooling_nwc_sum {dilations = dense<[2]> : tensor<1xi64>, strides = dense<[4]> : tensor<1xi64>}			%0 = linalg.pooling_nwc_sum {dilations = dense<[2]> : tensor<1xi64>, strides = dense<[4]> : tensor<1xi64>}
	ins(%input, %shape : tensor<1x16x1xi32>, tensor<2xi32>) outs(%output : tensor<1x4x1xi32>) -> tensor<1x4x1xi32>			ins(%input, %shape : tensor<1x16x1xi32>, tensor<2xi32>) outs(%output : tensor<1x4x1xi32>)
	return %0: tensor<1x4x1xi32>			return %0: tensor<1x4x1xi32>
	}			}

	// CHECK-LABEL: @generalize_pooling_nwc_sum_i32			// CHECK-LABEL: @generalize_pooling_nwc_sum_i32
	// CHECK: ^{{.*}}(%[[IN_ARG:.+]]: i32, %[[SHAPE_ARG:.+]]: i32, %[[OUT_ARG:.+]]: i32)			// CHECK: ^{{.*}}(%[[IN_ARG:.+]]: i32, %[[SHAPE_ARG:.+]]: i32, %[[OUT_ARG:.+]]: i32)
	// CHECK-NEXT: %[[ADD:.+]] = arith.addi %[[OUT_ARG]], %[[IN_ARG]] : i32			// CHECK-NEXT: %[[ADD:.+]] = arith.addi %[[OUT_ARG]], %[[IN_ARG]] : i32
	// CHECK-NEXT: linalg.yield %[[ADD]] : i32			// CHECK-NEXT: linalg.yield %[[ADD]] : i32
	// CHECK-NEXT: -> tensor<1x4x1xi32>			// CHECK-NEXT: -> tensor<1x4x1xi32>

	// -----			// -----

	func.func @generalize_fill_0d(%value: f64, %O: tensor<f32>) -> tensor<f32> {			func.func @generalize_fill_0d(%value: f64, %O: tensor<f32>) -> tensor<f32> {
	%0 = linalg.fill ins(%value: f64) outs(%O : tensor<f32>) -> tensor<f32>			%0 = linalg.fill ins(%value: f64) outs(%O : tensor<f32>)
	return %0: tensor<f32>			return %0: tensor<f32>
	}			}

	// CHECK-DAG: #[[$MAP0:.+]] = affine_map<() -> ()>			// CHECK-DAG: #[[$MAP0:.+]] = affine_map<() -> ()>

	// CHECK-LABEL: @generalize_fill_0d			// CHECK-LABEL: @generalize_fill_0d
	// CHECK: linalg.generic			// CHECK: linalg.generic
	// CHECK-SAME: indexing_maps = [#[[$MAP0]], #[[$MAP0]]]			// CHECK-SAME: indexing_maps = [#[[$MAP0]], #[[$MAP0]]]
	Show All 12 Lines
	// CHECK-LABEL: @generalize_fill			// CHECK-LABEL: @generalize_fill
	// CHECK: linalg.generic			// CHECK: linalg.generic
	// CHECK-SAME: indexing_maps = [#[[$MAP0]], #[[$MAP1]]]			// CHECK-SAME: indexing_maps = [#[[$MAP0]], #[[$MAP1]]]
	// CHECK-SAME: iterator_types = ["parallel", "parallel"]			// CHECK-SAME: iterator_types = ["parallel", "parallel"]

	// -----			// -----

	func.func @generalize_index(%min: f64, %max: f64, %seed: i32, %O: tensor<16x32xf32>) -> tensor<16x32xf32> {			func.func @generalize_index(%min: f64, %max: f64, %seed: i32, %O: tensor<16x32xf32>) -> tensor<16x32xf32> {
	%0 = linalg.fill_rng_2d ins(%min, %max, %seed: f64, f64, i32) outs(%O : tensor<16x32xf32>) -> tensor<16x32xf32>			%0 = linalg.fill_rng_2d ins(%min, %max, %seed: f64, f64, i32) outs(%O : tensor<16x32xf32>)
	return %0: tensor<16x32xf32>			return %0: tensor<16x32xf32>
	}			}

	// CHECK-LABEL: @generalize_index			// CHECK-LABEL: @generalize_index
	// CHECK-DAG: %[[IDX0:.+]] = linalg.index 0 : index			// CHECK-DAG: %[[IDX0:.+]] = linalg.index 0 : index
	// CHECK-DAG: %[[IDX1:.+]] = linalg.index 1 : index			// CHECK-DAG: %[[IDX1:.+]] = linalg.index 1 : index
	// CHECK-DAG: %[[IDX0_CAST:.+]] = arith.index_cast %[[IDX0]] : index to i32			// CHECK-DAG: %[[IDX0_CAST:.+]] = arith.index_cast %[[IDX0]] : index to i32
	// CHECK-DAG: %[[IDX1_CAST:.+]] = arith.index_cast %[[IDX1]] : index to i32			// CHECK-DAG: %[[IDX1_CAST:.+]] = arith.index_cast %[[IDX1]] : index to i32

	// -----			// -----

	func.func @generalize_const(%min: f64, %max: f64, %seed: i32, %O: tensor<16x32xf32>) -> tensor<16x32xf32> {			func.func @generalize_const(%min: f64, %max: f64, %seed: i32, %O: tensor<16x32xf32>) -> tensor<16x32xf32> {
	%0 = linalg.fill_rng_2d ins(%min, %max, %seed: f64, f64, i32) outs(%O : tensor<16x32xf32>) -> tensor<16x32xf32>			%0 = linalg.fill_rng_2d ins(%min, %max, %seed: f64, f64, i32) outs(%O : tensor<16x32xf32>)
	return %0: tensor<16x32xf32>			return %0: tensor<16x32xf32>
	}			}

	// CHECK-LABEL: @generalize_const			// CHECK-LABEL: @generalize_const
	// CHECK-DAG: %[[CST0:.+]] = arith.constant 1103515245 : i32			// CHECK-DAG: %[[CST0:.+]] = arith.constant 1103515245 : i32
	// CHECK-DAG: %[[CST1:.+]] = arith.constant 12345 : i32			// CHECK-DAG: %[[CST1:.+]] = arith.constant 12345 : i32
	// CHECK-DAG: %[[CST2:.+]] = arith.constant 2.3283063999999999E-10 : f64			// CHECK-DAG: %[[CST2:.+]] = arith.constant 2.3283063999999999E-10 : f64

	// -----			// -----

	// Verifies the default value of the fun attribute is an exp op.			// Verifies the default value of the fun attribute is an exp op.
	func.func @generalize_elemwise_exp(%lhs : tensor<4x8xf32>, %output : tensor<4x8xf32>) -> tensor<4x8xf32> {			func.func @generalize_elemwise_exp(%lhs : tensor<4x8xf32>, %output : tensor<4x8xf32>) -> tensor<4x8xf32> {
	%0 = linalg.elemwise_unary ins(%lhs: tensor<4x8xf32>) outs(%output: tensor<4x8xf32>) -> tensor<4x8xf32>			%0 = linalg.elemwise_unary ins(%lhs: tensor<4x8xf32>) outs(%output: tensor<4x8xf32>)
	return %0: tensor<4x8xf32>			return %0: tensor<4x8xf32>
	}			}

	// CHECK-LABEL: @generalize_elemwise_exp			// CHECK-LABEL: @generalize_elemwise_exp
	// CHECK: = math.exp			// CHECK: = math.exp

	// -----			// -----

	// Verifies the fun attribute controls the unary function used.			// Verifies the fun attribute controls the unary function used.
	func.func @generalize_elemwise_log(%lhs : tensor<4x8xf32>, %output : tensor<4x8xf32>) -> tensor<4x8xf32> {			func.func @generalize_elemwise_log(%lhs : tensor<4x8xf32>, %output : tensor<4x8xf32>) -> tensor<4x8xf32> {
	%0 = linalg.elemwise_unary {fun = #linalg.unary_fn<log>}			%0 = linalg.elemwise_unary {fun = #linalg.unary_fn<log>}
	ins(%lhs: tensor<4x8xf32>) outs(%output: tensor<4x8xf32>) -> tensor<4x8xf32>			ins(%lhs: tensor<4x8xf32>) outs(%output: tensor<4x8xf32>)
	return %0: tensor<4x8xf32>			return %0: tensor<4x8xf32>
	}			}

	// CHECK-LABEL: @generalize_elemwise_log			// CHECK-LABEL: @generalize_elemwise_log
	// CHECK: = math.log			// CHECK: = math.log

	// -----			// -----

	// Verifies the fun attribute controls the unary function used.			// Verifies the fun attribute controls the unary function used.
	func.func @generalize_elemwise_abs(%lhs : tensor<4x8xf32>, %output : tensor<4x8xf32>) -> tensor<4x8xf32> {			func.func @generalize_elemwise_abs(%lhs : tensor<4x8xf32>, %output : tensor<4x8xf32>) -> tensor<4x8xf32> {
	%0 = linalg.elemwise_unary {fun = #linalg.unary_fn<abs>}			%0 = linalg.elemwise_unary {fun = #linalg.unary_fn<abs>}
	ins(%lhs: tensor<4x8xf32>) outs(%output: tensor<4x8xf32>) -> tensor<4x8xf32>			ins(%lhs: tensor<4x8xf32>) outs(%output: tensor<4x8xf32>)
	return %0: tensor<4x8xf32>			return %0: tensor<4x8xf32>
	}			}

	// CHECK-LABEL: @generalize_elemwise_abs			// CHECK-LABEL: @generalize_elemwise_abs
	// CHECK: = math.absf			// CHECK: = math.absf

	// -----			// -----

	// Verifies the fun attribute controls the unary function used.			// Verifies the fun attribute controls the unary function used.
	func.func @generalize_elemwise_ceil(%lhs : tensor<4x8xf32>, %output : tensor<4x8xf32>) -> tensor<4x8xf32> {			func.func @generalize_elemwise_ceil(%lhs : tensor<4x8xf32>, %output : tensor<4x8xf32>) -> tensor<4x8xf32> {
	%0 = linalg.elemwise_unary {fun = #linalg.unary_fn<ceil>}			%0 = linalg.elemwise_unary {fun = #linalg.unary_fn<ceil>}
	ins(%lhs: tensor<4x8xf32>) outs(%output: tensor<4x8xf32>) -> tensor<4x8xf32>			ins(%lhs: tensor<4x8xf32>) outs(%output: tensor<4x8xf32>)
	return %0: tensor<4x8xf32>			return %0: tensor<4x8xf32>
	}			}

	// CHECK-LABEL: @generalize_elemwise_ceil			// CHECK-LABEL: @generalize_elemwise_ceil
	// CHECK: = math.ceil			// CHECK: = math.ceil

	// -----			// -----

	// Verifies the fun attribute controls the unary function used.			// Verifies the fun attribute controls the unary function used.
	func.func @generalize_elemwise_floor(%lhs : tensor<4x8xf32>, %output : tensor<4x8xf32>) -> tensor<4x8xf32> {			func.func @generalize_elemwise_floor(%lhs : tensor<4x8xf32>, %output : tensor<4x8xf32>) -> tensor<4x8xf32> {
	%0 = linalg.elemwise_unary {fun = #linalg.unary_fn<floor>}			%0 = linalg.elemwise_unary {fun = #linalg.unary_fn<floor>}
	ins(%lhs: tensor<4x8xf32>) outs(%output: tensor<4x8xf32>) -> tensor<4x8xf32>			ins(%lhs: tensor<4x8xf32>) outs(%output: tensor<4x8xf32>)
	return %0: tensor<4x8xf32>			return %0: tensor<4x8xf32>
	}			}

	// CHECK-LABEL: @generalize_elemwise_floor			// CHECK-LABEL: @generalize_elemwise_floor
	// CHECK: = math.floor			// CHECK: = math.floor

	// -----			// -----

	// Verifies the fun attribute controls the unary function used.			// Verifies the fun attribute controls the unary function used.
	func.func @generalize_elemwise_negf(%lhs : tensor<4x8xf32>, %output : tensor<4x8xf32>) -> tensor<4x8xf32> {			func.func @generalize_elemwise_negf(%lhs : tensor<4x8xf32>, %output : tensor<4x8xf32>) -> tensor<4x8xf32> {
	%0 = linalg.elemwise_unary {fun = #linalg.unary_fn<negf>}			%0 = linalg.elemwise_unary {fun = #linalg.unary_fn<negf>}
	ins(%lhs: tensor<4x8xf32>) outs(%output: tensor<4x8xf32>) -> tensor<4x8xf32>			ins(%lhs: tensor<4x8xf32>) outs(%output: tensor<4x8xf32>)
	return %0: tensor<4x8xf32>			return %0: tensor<4x8xf32>
	}			}

	// CHECK-LABEL: @generalize_elemwise_negf			// CHECK-LABEL: @generalize_elemwise_negf
	// CHECK: = arith.negf			// CHECK: = arith.negf

	// -----			// -----

	// Verifies the default value of the fun attribute is an add op.			// Verifies the default value of the fun attribute is an add op.
	func.func @generalize_elemwise_add(%lhs : tensor<4x8xf32>, %rhs : tensor<4x8xf32>, %output : tensor<4x8xf32>) -> tensor<4x8xf32> {			func.func @generalize_elemwise_add(%lhs : tensor<4x8xf32>, %rhs : tensor<4x8xf32>, %output : tensor<4x8xf32>) -> tensor<4x8xf32> {
	%0 = linalg.elemwise_binary ins(%lhs, %rhs: tensor<4x8xf32>, tensor<4x8xf32>)			%0 = linalg.elemwise_binary ins(%lhs, %rhs: tensor<4x8xf32>, tensor<4x8xf32>)
	outs(%output: tensor<4x8xf32>) -> tensor<4x8xf32>			outs(%output: tensor<4x8xf32>)
	return %0: tensor<4x8xf32>			return %0: tensor<4x8xf32>
	}			}

	// CHECK-LABEL: @generalize_elemwise_add			// CHECK-LABEL: @generalize_elemwise_add
	// CHECK: = arith.addf			// CHECK: = arith.addf

	// -----			// -----

	// Verifies the fun attribute controls the binary function used.			// Verifies the fun attribute controls the binary function used.
	func.func @generalize_elemwise_mul(%lhs : tensor<4x8xf32>, %rhs : tensor<4x8xf32>, %output : tensor<4x8xf32>) -> tensor<4x8xf32> {			func.func @generalize_elemwise_mul(%lhs : tensor<4x8xf32>, %rhs : tensor<4x8xf32>, %output : tensor<4x8xf32>) -> tensor<4x8xf32> {
	%0 = linalg.elemwise_binary {fun = #linalg.binary_fn<mul>}			%0 = linalg.elemwise_binary {fun = #linalg.binary_fn<mul>}
	ins(%lhs, %rhs: tensor<4x8xf32>, tensor<4x8xf32>)			ins(%lhs, %rhs: tensor<4x8xf32>, tensor<4x8xf32>)
	outs(%output: tensor<4x8xf32>) -> tensor<4x8xf32>			outs(%output: tensor<4x8xf32>)
	return %0: tensor<4x8xf32>			return %0: tensor<4x8xf32>
	}			}

	// CHECK-LABEL: @generalize_elemwise_mul			// CHECK-LABEL: @generalize_elemwise_mul
	// CHECK: = arith.mulf			// CHECK: = arith.mulf

	// -----			// -----

	// Verifies pointwise ops support rank zero input tensors			// Verifies pointwise ops support rank zero input tensors
	func.func @generalize_elemwise_rank_zero(%lhs : tensor<f32>, %rhs : tensor<f32>, %output : tensor<4x8xf32>) -> tensor<4x8xf32> {			func.func @generalize_elemwise_rank_zero(%lhs : tensor<f32>, %rhs : tensor<f32>, %output : tensor<4x8xf32>) -> tensor<4x8xf32> {
	%0 = linalg.elemwise_binary {fun = #linalg.binary_fn<sub>}			%0 = linalg.elemwise_binary {fun = #linalg.binary_fn<sub>}
	ins(%lhs, %rhs: tensor<f32>, tensor<f32>)			ins(%lhs, %rhs: tensor<f32>, tensor<f32>)
	outs(%output: tensor<4x8xf32>) -> tensor<4x8xf32>			outs(%output: tensor<4x8xf32>)
	return %0: tensor<4x8xf32>			return %0: tensor<4x8xf32>
	}			}

	// CHECK-LABEL: @generalize_elemwise_rank_zero			// CHECK-LABEL: @generalize_elemwise_rank_zero
	// CHECK: linalg.generic			// CHECK: linalg.generic
	// CHECK-SAME: iterator_types = ["parallel", "parallel"]			// CHECK-SAME: iterator_types = ["parallel", "parallel"]
	// CHECK: = arith.subf			// CHECK: = arith.subf

	// -----			// -----

	// Verifies the fun attribute controls the binary function used.			// Verifies the fun attribute controls the binary function used.
	func.func @generalize_copy(%lhs : tensor<4x8xf32>, %output : tensor<4x8xf32>) -> tensor<4x8xf32> {			func.func @generalize_copy(%lhs : tensor<4x8xf32>, %output : tensor<4x8xf32>) -> tensor<4x8xf32> {
	%0 = linalg.copy ins(%lhs: tensor<4x8xf32>) outs(%output: tensor<4x8xf32>) -> tensor<4x8xf32>			%0 = linalg.copy ins(%lhs: tensor<4x8xf32>) outs(%output: tensor<4x8xf32>)
	return %0: tensor<4x8xf32>			return %0: tensor<4x8xf32>
	}			}

	// CHECK-LABEL: @generalize_copy			// CHECK-LABEL: @generalize_copy
	// CHECK: linalg.generic			// CHECK: linalg.generic
	// CHECK-NEXT: ^bb0(%[[I:[0-9a-zA-Z]*]]: f32			// CHECK-NEXT: ^bb0(%[[I:[0-9a-zA-Z]*]]: f32
	// CHECK-NEXT: linalg.yield %[[I]]			// CHECK-NEXT: linalg.yield %[[I]]

mlir/test/Dialect/Linalg/generalize-pad-tensor.mlir

	// RUN: mlir-opt -split-input-file --test-linalg-transform-patterns="test-generalize-pad-tensor" %s \| FileCheck --check-prefix=CHECK %s			// RUN: mlir-opt -split-input-file --test-linalg-transform-patterns="test-generalize-pad-tensor" %s \| FileCheck --check-prefix=CHECK %s

	// CHECK-LABEL: func @generalize_pad_tensor_static_shape(			// CHECK-LABEL: func @generalize_pad_tensor_static_shape(
	// CHECK-SAME: %[[IN:.*]]: tensor<1x28x28x1xf32>) -> tensor<1x32x32x1xf32> {			// CHECK-SAME: %[[IN:.*]]: tensor<1x28x28x1xf32>) -> tensor<1x32x32x1xf32> {
	// CHECK: %[[C0:.*]] = arith.constant 0.000000e+00 : f32			// CHECK: %[[C0:.*]] = arith.constant 0.000000e+00 : f32
	// CHECK: %[[INIT:.*]] = tensor.empty() : tensor<1x32x32x1xf32>			// CHECK: %[[INIT:.*]] = tensor.empty() : tensor<1x32x32x1xf32>
	// CHECK: %[[FILL:.*]] = linalg.fill ins(%[[C0]] : f32) outs(%[[INIT]] : tensor<1x32x32x1xf32>) -> tensor<1x32x32x1xf32>			// CHECK: %[[FILL:.*]] = linalg.fill ins(%[[C0]] : f32) outs(%[[INIT]] : tensor<1x32x32x1xf32>)
	// CHECK: %[[PADDED:.*]] = tensor.insert_slice %[[IN]] into %[[FILL]][0, 2, 2, 0] [1, 28, 28, 1] [1, 1, 1, 1] : tensor<1x28x28x1xf32> into tensor<1x32x32x1xf32>			// CHECK: %[[PADDED:.*]] = tensor.insert_slice %[[IN]] into %[[FILL]][0, 2, 2, 0] [1, 28, 28, 1] [1, 1, 1, 1] : tensor<1x28x28x1xf32> into tensor<1x32x32x1xf32>
	// CHECK: return %[[PADDED]] : tensor<1x32x32x1xf32>			// CHECK: return %[[PADDED]] : tensor<1x32x32x1xf32>
	func.func @generalize_pad_tensor_static_shape(%arg0: tensor<1x28x28x1xf32>) -> tensor<1x32x32x1xf32> {			func.func @generalize_pad_tensor_static_shape(%arg0: tensor<1x28x28x1xf32>) -> tensor<1x32x32x1xf32> {
	%cst = arith.constant 0.000000e+00 : f32			%cst = arith.constant 0.000000e+00 : f32
	%0 = tensor.pad %arg0 low[0, 2, 2, 0] high[0, 2, 2, 0] {			%0 = tensor.pad %arg0 low[0, 2, 2, 0] high[0, 2, 2, 0] {
	^bb0(%arg1: index, %arg2: index, %arg3: index, %arg4: index):			^bb0(%arg1: index, %arg2: index, %arg3: index, %arg4: index):
	tensor.yield %cst : f32			tensor.yield %cst : f32
	} : tensor<1x28x28x1xf32> to tensor<1x32x32x1xf32>			} : tensor<1x28x28x1xf32> to tensor<1x32x32x1xf32>
	return %0 : tensor<1x32x32x1xf32>			return %0 : tensor<1x32x32x1xf32>
	}			}

	// CHECK-LABEL: func @generalize_pad_tensor_dynamic_shape(			// CHECK-LABEL: func @generalize_pad_tensor_dynamic_shape(
	// CHECK-SAME: %[[IN:.*]]: tensor<4x?x2x?xf32>,			// CHECK-SAME: %[[IN:.*]]: tensor<4x?x2x?xf32>,
	// CHECK-SAME: %[[OFFSET:.*]]: index) -> tensor<4x?x?x?xf32> {			// CHECK-SAME: %[[OFFSET:.*]]: index) -> tensor<4x?x?x?xf32> {
	// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index			// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
	// CHECK-DAG: %[[CST:.*]] = arith.constant 0.000000e+00 : f32			// CHECK-DAG: %[[CST:.*]] = arith.constant 0.000000e+00 : f32
	// CHECK-DAG: %[[C2:.*]] = arith.constant 2 : index			// CHECK-DAG: %[[C2:.*]] = arith.constant 2 : index
	// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index			// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
	// CHECK-DAG: %[[C3:.*]] = arith.constant 3 : index			// CHECK-DAG: %[[C3:.*]] = arith.constant 3 : index
	// CHECK: %[[DIM1:.*]] = tensor.dim %[[IN]], %[[C1]] : tensor<4x?x2x?xf32>			// CHECK: %[[DIM1:.*]] = tensor.dim %[[IN]], %[[C1]] : tensor<4x?x2x?xf32>
	// CHECK: %[[OUT_DIM2:.*]] = arith.addi %[[OFFSET]], %[[C2]] : index			// CHECK: %[[OUT_DIM2:.*]] = arith.addi %[[OFFSET]], %[[C2]] : index
	// CHECK: %[[DIM3:.*]] = tensor.dim %[[IN]], %[[C3]] : tensor<4x?x2x?xf32>			// CHECK: %[[DIM3:.*]] = tensor.dim %[[IN]], %[[C3]] : tensor<4x?x2x?xf32>
	// CHECK: %[[OUT_DIM3:.*]] = arith.addi %[[DIM3]], %[[OFFSET]] : index			// CHECK: %[[OUT_DIM3:.*]] = arith.addi %[[DIM3]], %[[OFFSET]] : index
	// CHECK: %[[INIT:.*]] = tensor.empty(%[[DIM1]], %[[OUT_DIM2]], %[[OUT_DIM3]]) : tensor<4x?x?x?xf32>			// CHECK: %[[INIT:.*]] = tensor.empty(%[[DIM1]], %[[OUT_DIM2]], %[[OUT_DIM3]]) : tensor<4x?x?x?xf32>
	// CHECK: %[[FILL:.*]] = linalg.fill ins(%[[CST]] : f32) outs(%[[INIT]] : tensor<4x?x?x?xf32>) -> tensor<4x?x?x?xf32>			// CHECK: %[[FILL:.*]] = linalg.fill ins(%[[CST]] : f32) outs(%[[INIT]] : tensor<4x?x?x?xf32>)
	// CHECK: %[[DIM1_1:.*]] = tensor.dim %[[IN]], %[[C1]] : tensor<4x?x2x?xf32>			// CHECK: %[[DIM1_1:.*]] = tensor.dim %[[IN]], %[[C1]] : tensor<4x?x2x?xf32>
	// CHECK: %[[DIM3_1:.*]] = tensor.dim %[[IN]], %[[C3]] : tensor<4x?x2x?xf32>			// CHECK: %[[DIM3_1:.*]] = tensor.dim %[[IN]], %[[C3]] : tensor<4x?x2x?xf32>
	// CHECK: %[[PADDED:.*]] = tensor.insert_slice %[[IN]] into %[[FILL]]{{\[}}%[[C0]], %[[C0]], %[[OFFSET]], %[[C0]]] [4, %[[DIM1_1]], 2, %[[DIM3_1]]] [1, 1, 1, 1] : tensor<4x?x2x?xf32> into tensor<4x?x?x?xf32>			// CHECK: %[[PADDED:.*]] = tensor.insert_slice %[[IN]] into %[[FILL]]{{\[}}%[[C0]], %[[C0]], %[[OFFSET]], %[[C0]]] [4, %[[DIM1_1]], 2, %[[DIM3_1]]] [1, 1, 1, 1] : tensor<4x?x2x?xf32> into tensor<4x?x?x?xf32>
	// CHECK: return %[[PADDED]] : tensor<4x?x?x?xf32>			// CHECK: return %[[PADDED]] : tensor<4x?x?x?xf32>
	// CHECK: }			// CHECK: }
	func.func @generalize_pad_tensor_dynamic_shape(%arg0: tensor<4x?x2x?xf32>, %arg1: index) -> tensor<4x?x?x?xf32> {			func.func @generalize_pad_tensor_dynamic_shape(%arg0: tensor<4x?x2x?xf32>, %arg1: index) -> tensor<4x?x?x?xf32> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cst = arith.constant 0.0 : f32			%cst = arith.constant 0.0 : f32
	%out = tensor.pad %arg0 low[%c0, %c0, %arg1, %c0] high[%c0, %c0, %c0, %arg1] {			%out = tensor.pad %arg0 low[%c0, %c0, %arg1, %c0] high[%c0, %c0, %c0, %arg1] {
	^bb0(%gen_arg1: index, %gen_arg2: index, %gen_arg3: index, %gen_arg4: index):			^bb0(%gen_arg1: index, %gen_arg2: index, %gen_arg3: index, %gen_arg4: index):
	tensor.yield %cst : f32			tensor.yield %cst : f32
	} : tensor<4x?x2x?xf32> to tensor<4x?x?x?xf32>			} : tensor<4x?x2x?xf32> to tensor<4x?x?x?xf32>
	return %out : tensor<4x?x?x?xf32>			return %out : tensor<4x?x?x?xf32>
	}			}

mlir/test/Dialect/Linalg/invalid.mlir

Show First 20 Lines • Show All 298 Lines • ▼ Show 20 Lines	func.func @named_ops(%a3: memref<?x?x?xf32>, %b3: memref<?x?xf32>, %c3: memref<?x?x?xf32>) {
// expected-error @+1 {{expected operand rank (2) to match the result rank of indexing_map #1 (3)}}		// expected-error @+1 {{expected operand rank (2) to match the result rank of indexing_map #1 (3)}}
linalg.batch_matmul ins(%a3, %b3: memref<?x?x?xf32>, memref<?x?xf32>)		linalg.batch_matmul ins(%a3, %b3: memref<?x?x?xf32>, memref<?x?xf32>)
outs(%c3 : memref<?x?x?xf32>)		outs(%c3 : memref<?x?x?xf32>)
return		return
}		}

// -----		// -----

func.func @incorrect_region_arg_count(%m: memref<?x?xf32>) {		func.func @incorrect_region_arg_count(%m: tensor<?x?xf32>) {
// expected-error @+3 {{region expects 3 args, got 2}}		// expected-error @+2 {{region expects 3 args, got 2}}
%res = linalg.matmul ins(%m, %m : memref<?x?xf32>, memref<?x?xf32>)		%res = linalg.matmul outs(%m, %m : tensor<?x?xf32>, tensor<?x?xf32>)
-> (tensor<?x?xf32>, tensor<?x?xf32>)
return		return
}		}

// -----		// -----

func.func @matching_inits(%m: memref<?x?xf32>, %t: tensor<?x?xf32>) {
// expected-error @+1 {{expected type of operand #2 ('tensor<?x?xf32>') to match type of corresponding result ('tensor<?xf32>')}}
%res = linalg.matmul ins(%m, %m : memref<?x?xf32>, memref<?x?xf32>)
outs(%t : tensor<?x?xf32>)
-> tensor<?xf32>
return
}

// -----

func.func @illegal_fill_tensor_no_return(%arg0 : index, %arg1 : index, %arg2 : f32)
{
%0 = tensor.empty(%arg0, %arg1) : tensor<?x?xf32>
// expected-error @+1 {{expected the number of results (0) to be equal to the number of output tensors (1)}}
linalg.fill ins(%arg2 : f32) outs(%0 : tensor<?x?xf32>)
}

// -----

func.func @illegal_fill_memref_with_tensor_return
(%arg0 : memref<?x?xf32>, %arg1 : f32) -> tensor<?x?xf32>
{
// expected-error @+1 {{expected the number of results (1) to be equal to the number of output tensors (0)}}
%0 = linalg.fill ins(%arg1 : f32) outs(%arg0 : memref<?x?xf32>) -> tensor<?x?xf32>
return %0 : tensor<?x?xf32>
}

// -----

func.func @illegal_fill_tensor_with_memref_return
(%arg0 : tensor<?x?xf32>, %arg1 : f32) -> memref<?x?xf32>
{
// expected-error @+1 {{result #0 must be ranked tensor of any type values, but got 'memref<?x?xf32>'}}
%0 = linalg.fill ins(%arg1 : f32) outs(%arg0 : tensor<?x?xf32>) -> memref<?x?xf32>
return %0 : memref<?x?xf32>
}

// -----

func.func @invalid_static_matmul(%arg0: memref<2x4xf32>, %arg1: memref<3x4xf32>, %arg2: memref<2x4xf32>) {		func.func @invalid_static_matmul(%arg0: memref<2x4xf32>, %arg1: memref<3x4xf32>, %arg2: memref<2x4xf32>) {
// expected-error @+1 {{inferred input/output operand #1 has shape's dimension #0 to be 4, but found 3}}		// expected-error @+1 {{inferred input/output operand #1 has shape's dimension #0 to be 4, but found 3}}
linalg.matmul ins(%arg0, %arg1 : memref<2x4xf32>, memref<3x4xf32>)		linalg.matmul ins(%arg0, %arg1 : memref<2x4xf32>, memref<3x4xf32>)
outs(%arg2 :memref<2x4xf32>)		outs(%arg2 :memref<2x4xf32>)
return		return
}		}

// -----		// -----
▲ Show 20 Lines • Show All 365 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/named-ops.mlir

// RUN: mlir-opt -split-input-file -verify-diagnostics %s \| FileCheck %s		// RUN: mlir-opt -split-input-file -verify-diagnostics %s \| FileCheck %s

// CHECK-LABEL: func @depthwise_conv_1d_nwc_wcm		// CHECK-LABEL: func @depthwise_conv_1d_nwc_wcm
func.func @depthwise_conv_1d_nwc_wcm(%input: tensor<1x12x8xf32>, %filter: tensor<3x8x8xf32>) -> tensor<1x10x8x8xf32> {		func.func @depthwise_conv_1d_nwc_wcm(%input: tensor<1x12x8xf32>, %filter: tensor<3x8x8xf32>) -> tensor<1x10x8x8xf32> {
%zero = arith.constant 0.000000e+00 : f32		%zero = arith.constant 0.000000e+00 : f32
%init = tensor.empty() : tensor<1x10x8x8xf32>		%init = tensor.empty() : tensor<1x10x8x8xf32>
%fill = linalg.fill ins(%zero : f32) outs(%init : tensor<1x10x8x8xf32>) -> tensor<1x10x8x8xf32>		%fill = linalg.fill ins(%zero : f32) outs(%init : tensor<1x10x8x8xf32>)
// CHECK: depthwise_conv_1d_nwc_wcm		// CHECK: depthwise_conv_1d_nwc_wcm
%0 = linalg.depthwise_conv_1d_nwc_wcm {dilations = dense<1> : tensor<1xi64>, strides = dense<1> : tensor<1xi64>}		%0 = linalg.depthwise_conv_1d_nwc_wcm {dilations = dense<1> : tensor<1xi64>, strides = dense<1> : tensor<1xi64>}
ins(%input, %filter : tensor<1x12x8xf32>, tensor<3x8x8xf32>)		ins(%input, %filter : tensor<1x12x8xf32>, tensor<3x8x8xf32>)
outs(%fill : tensor<1x10x8x8xf32>) -> tensor<1x10x8x8xf32>		outs(%fill : tensor<1x10x8x8xf32>)
return %0 : tensor<1x10x8x8xf32>		return %0 : tensor<1x10x8x8xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @depthwise_conv_1d_nwc_wc		// CHECK-LABEL: func @depthwise_conv_1d_nwc_wc
func.func @depthwise_conv_1d_nwc_wc(%input: tensor<1x12x8xf32>, %filter: tensor<3x8xf32>) -> tensor<1x10x8xf32> {		func.func @depthwise_conv_1d_nwc_wc(%input: tensor<1x12x8xf32>, %filter: tensor<3x8xf32>) -> tensor<1x10x8xf32> {
%zero = arith.constant 0.000000e+00 : f32		%zero = arith.constant 0.000000e+00 : f32
%init = tensor.empty() : tensor<1x10x8xf32>		%init = tensor.empty() : tensor<1x10x8xf32>
%fill = linalg.fill ins(%zero : f32) outs(%init : tensor<1x10x8xf32>) -> tensor<1x10x8xf32>		%fill = linalg.fill ins(%zero : f32) outs(%init : tensor<1x10x8xf32>)
// CHECK: depthwise_conv_1d_nwc_wc		// CHECK: depthwise_conv_1d_nwc_wc
%0 = linalg.depthwise_conv_1d_nwc_wc {dilations = dense<1> : tensor<1xi64>, strides = dense<1> : tensor<1xi64>}		%0 = linalg.depthwise_conv_1d_nwc_wc {dilations = dense<1> : tensor<1xi64>, strides = dense<1> : tensor<1xi64>}
ins(%input, %filter : tensor<1x12x8xf32>, tensor<3x8xf32>)		ins(%input, %filter : tensor<1x12x8xf32>, tensor<3x8xf32>)
outs(%fill : tensor<1x10x8xf32>) -> tensor<1x10x8xf32>		outs(%fill : tensor<1x10x8xf32>)
return %0 : tensor<1x10x8xf32>		return %0 : tensor<1x10x8xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @depthwise_conv_2d_nhwc_hwcm_tensor		// CHECK-LABEL: func @depthwise_conv_2d_nhwc_hwcm_tensor
func.func @depthwise_conv_2d_nhwc_hwcm_tensor(%input: tensor<2x4x5x2xf32>, %filter: tensor<2x2x2x3xf32>) -> tensor<2x3x4x2x3xf32> {		func.func @depthwise_conv_2d_nhwc_hwcm_tensor(%input: tensor<2x4x5x2xf32>, %filter: tensor<2x2x2x3xf32>) -> tensor<2x3x4x2x3xf32> {
%zero = arith.constant 0.000000e+00 : f32		%zero = arith.constant 0.000000e+00 : f32
%init = tensor.empty() : tensor<2x3x4x2x3xf32>		%init = tensor.empty() : tensor<2x3x4x2x3xf32>
%fill = linalg.fill ins(%zero : f32) outs(%init : tensor<2x3x4x2x3xf32>) -> tensor<2x3x4x2x3xf32>		%fill = linalg.fill ins(%zero : f32) outs(%init : tensor<2x3x4x2x3xf32>)
// CHECK: %{{.+}} = linalg.depthwise_conv_2d_nhwc_hwcm		// CHECK: %{{.+}} = linalg.depthwise_conv_2d_nhwc_hwcm
// CHECK-SAME: {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>}		// CHECK-SAME: {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>}
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<2x4x5x2xf32>, tensor<2x2x2x3xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<2x4x5x2xf32>, tensor<2x2x2x3xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<2x3x4x2x3xf32>)		// CHECK-SAME: outs(%{{.+}} : tensor<2x3x4x2x3xf32>)
%0 = linalg.depthwise_conv_2d_nhwc_hwcm		%0 = linalg.depthwise_conv_2d_nhwc_hwcm
{ dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64> }		{ dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64> }
ins(%input, %filter : tensor<2x4x5x2xf32>, tensor<2x2x2x3xf32>)		ins(%input, %filter : tensor<2x4x5x2xf32>, tensor<2x2x2x3xf32>)
outs(%fill : tensor<2x3x4x2x3xf32>) -> tensor<2x3x4x2x3xf32>		outs(%fill : tensor<2x3x4x2x3xf32>)
return %0 : tensor<2x3x4x2x3xf32>		return %0 : tensor<2x3x4x2x3xf32>
}		}

// CHECK-LABEL: func @depthwise_conv_2d_nhwc_hwcm_memref		// CHECK-LABEL: func @depthwise_conv_2d_nhwc_hwcm_memref
func.func @depthwise_conv_2d_nhwc_hwcm_memref(%input: memref<2x4x5x2xf32>, %filter: memref<2x2x2x3xf32>, %output: memref<2x3x4x2x3xf32>) {		func.func @depthwise_conv_2d_nhwc_hwcm_memref(%input: memref<2x4x5x2xf32>, %filter: memref<2x2x2x3xf32>, %output: memref<2x3x4x2x3xf32>) {
// CHECK: linalg.depthwise_conv_2d_nhwc_hwcm		// CHECK: linalg.depthwise_conv_2d_nhwc_hwcm
// CHECK-SAME: {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>}		// CHECK-SAME: {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>}
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : memref<2x4x5x2xf32>, memref<2x2x2x3xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : memref<2x4x5x2xf32>, memref<2x2x2x3xf32>)
// CHECK-SAME: outs(%{{.+}} : memref<2x3x4x2x3xf32>)		// CHECK-SAME: outs(%{{.+}} : memref<2x3x4x2x3xf32>)
linalg.depthwise_conv_2d_nhwc_hwcm		linalg.depthwise_conv_2d_nhwc_hwcm
{ dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64> }		{ dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64> }
ins(%input, %filter : memref<2x4x5x2xf32>, memref<2x2x2x3xf32>)		ins(%input, %filter : memref<2x4x5x2xf32>, memref<2x2x2x3xf32>)
outs(%output : memref<2x3x4x2x3xf32>)		outs(%output : memref<2x3x4x2x3xf32>)
return		return
}		}

// CHECK-LABEL: func @depthwise_conv_1d_nw_tensor		// CHECK-LABEL: func @depthwise_conv_1d_nw_tensor
func.func @depthwise_conv_1d_nw_tensor(%input: tensor<1x113x96xf32>, %filter: tensor<3x96xf32>) -> tensor<1x56x96xf32> {		func.func @depthwise_conv_1d_nw_tensor(%input: tensor<1x113x96xf32>, %filter: tensor<3x96xf32>) -> tensor<1x56x96xf32> {
%init = tensor.empty() : tensor<1x56x96xf32>		%init = tensor.empty() : tensor<1x56x96xf32>
// CHECK: %{{.+}} = linalg.depthwise_conv_1d_nw		// CHECK: %{{.+}} = linalg.depthwise_conv_1d_nw
// CHECK-SAME: {dilations = dense<1> : vector<1xi64>, strides = dense<2> : vector<1xi64>}		// CHECK-SAME: {dilations = dense<1> : vector<1xi64>, strides = dense<2> : vector<1xi64>}
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x113x96xf32>, tensor<3x96xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x113x96xf32>, tensor<3x96xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<1x56x96xf32>) -> tensor<1x56x96xf32>		// CHECK-SAME: outs(%{{.+}} : tensor<1x56x96xf32>)
%0 = linalg.depthwise_conv_1d_nwc_wc {dilations = dense<1> : vector<1xi64>, strides = dense<2> : vector<1xi64>}		%0 = linalg.depthwise_conv_1d_nwc_wc {dilations = dense<1> : vector<1xi64>, strides = dense<2> : vector<1xi64>}
ins(%input, %filter: tensor<1x113x96xf32>, tensor<3x96xf32>)		ins(%input, %filter: tensor<1x113x96xf32>, tensor<3x96xf32>)
outs(%init: tensor<1x56x96xf32>) -> tensor<1x56x96xf32>		outs(%init: tensor<1x56x96xf32>)
return %0: tensor<1x56x96xf32>		return %0: tensor<1x56x96xf32>
}		}

// CHECK-LABEL: func @depthwise_conv_2d_nhwc_hwc_tensor		// CHECK-LABEL: func @depthwise_conv_2d_nhwc_hwc_tensor
func.func @depthwise_conv_2d_nhwc_hwc_tensor(%input: tensor<1x113x113x96xf32>, %filter: tensor<3x3x96xf32>) -> tensor<1x56x56x96xf32> {		func.func @depthwise_conv_2d_nhwc_hwc_tensor(%input: tensor<1x113x113x96xf32>, %filter: tensor<3x3x96xf32>) -> tensor<1x56x56x96xf32> {
%init = tensor.empty() : tensor<1x56x56x96xf32>		%init = tensor.empty() : tensor<1x56x56x96xf32>
// CHECK: %{{.+}} = linalg.depthwise_conv_2d_nhwc_hwc		// CHECK: %{{.+}} = linalg.depthwise_conv_2d_nhwc_hwc
// CHECK-SAME: {dilations = dense<1> : vector<2xi64>, strides = dense<2> : vector<2xi64>}		// CHECK-SAME: {dilations = dense<1> : vector<2xi64>, strides = dense<2> : vector<2xi64>}
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x113x113x96xf32>, tensor<3x3x96xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x113x113x96xf32>, tensor<3x3x96xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<1x56x56x96xf32>) -> tensor<1x56x56x96xf32>		// CHECK-SAME: outs(%{{.+}} : tensor<1x56x56x96xf32>)
%0 = linalg.depthwise_conv_2d_nhwc_hwc {dilations = dense<1> : vector<2xi64>, strides = dense<2> : vector<2xi64>}		%0 = linalg.depthwise_conv_2d_nhwc_hwc {dilations = dense<1> : vector<2xi64>, strides = dense<2> : vector<2xi64>}
ins(%input, %filter: tensor<1x113x113x96xf32>, tensor<3x3x96xf32>)		ins(%input, %filter: tensor<1x113x113x96xf32>, tensor<3x3x96xf32>)
outs(%init: tensor<1x56x56x96xf32>) -> tensor<1x56x56x96xf32>		outs(%init: tensor<1x56x56x96xf32>)
return %0: tensor<1x56x56x96xf32>		return %0: tensor<1x56x56x96xf32>
}		}

// CHECK-LABEL: func @depthwise_conv_2d_nhwc_hwc_memref		// CHECK-LABEL: func @depthwise_conv_2d_nhwc_hwc_memref
func.func @depthwise_conv_2d_nhwc_hwc_memref(%input: memref<1x113x113x96xf32>, %filter: memref<3x3x96xf32>, %output: memref<1x56x56x96xf32>) {		func.func @depthwise_conv_2d_nhwc_hwc_memref(%input: memref<1x113x113x96xf32>, %filter: memref<3x3x96xf32>, %output: memref<1x56x56x96xf32>) {
// CHECK: linalg.depthwise_conv_2d_nhwc_hwc		// CHECK: linalg.depthwise_conv_2d_nhwc_hwc
// CHECK-SAME: {dilations = dense<1> : vector<2xi64>, strides = dense<2> : vector<2xi64>}		// CHECK-SAME: {dilations = dense<1> : vector<2xi64>, strides = dense<2> : vector<2xi64>}
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : memref<1x113x113x96xf32>, memref<3x3x96xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : memref<1x113x113x96xf32>, memref<3x3x96xf32>)
// CHECK-SAME: outs(%{{.+}} : memref<1x56x56x96xf32>)		// CHECK-SAME: outs(%{{.+}} : memref<1x56x56x96xf32>)
linalg.depthwise_conv_2d_nhwc_hwc {dilations = dense<1> : vector<2xi64>, strides = dense<2> : vector<2xi64>}		linalg.depthwise_conv_2d_nhwc_hwc {dilations = dense<1> : vector<2xi64>, strides = dense<2> : vector<2xi64>}
ins(%input, %filter: memref<1x113x113x96xf32>, memref<3x3x96xf32>)		ins(%input, %filter: memref<1x113x113x96xf32>, memref<3x3x96xf32>)
outs(%output: memref<1x56x56x96xf32>)		outs(%output: memref<1x56x56x96xf32>)
return		return
}		}

// CHECK-LABEL: func @depthwise_conv_2d_nchw_chw_tensor		// CHECK-LABEL: func @depthwise_conv_2d_nchw_chw_tensor
func.func @depthwise_conv_2d_nchw_chw_tensor(%input: tensor<1x96x113x113xf32>, %filter: tensor<96x3x3xf32>) -> tensor<1x96x56x56xf32> {		func.func @depthwise_conv_2d_nchw_chw_tensor(%input: tensor<1x96x113x113xf32>, %filter: tensor<96x3x3xf32>) -> tensor<1x96x56x56xf32> {
%init = tensor.empty() : tensor<1x96x56x56xf32>		%init = tensor.empty() : tensor<1x96x56x56xf32>
// CHECK: %{{.+}} = linalg.depthwise_conv_2d_nchw_chw		// CHECK: %{{.+}} = linalg.depthwise_conv_2d_nchw_chw
// CHECK-SAME: {dilations = dense<1> : vector<2xi64>, strides = dense<2> : vector<2xi64>}		// CHECK-SAME: {dilations = dense<1> : vector<2xi64>, strides = dense<2> : vector<2xi64>}
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x96x113x113xf32>, tensor<96x3x3xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x96x113x113xf32>, tensor<96x3x3xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<1x96x56x56xf32>) -> tensor<1x96x56x56xf32>		// CHECK-SAME: outs(%{{.+}} : tensor<1x96x56x56xf32>)
%0 = linalg.depthwise_conv_2d_nchw_chw {dilations = dense<1> : vector<2xi64>, strides = dense<2> : vector<2xi64>}		%0 = linalg.depthwise_conv_2d_nchw_chw {dilations = dense<1> : vector<2xi64>, strides = dense<2> : vector<2xi64>}
ins(%input, %filter: tensor<1x96x113x113xf32>, tensor<96x3x3xf32>)		ins(%input, %filter: tensor<1x96x113x113xf32>, tensor<96x3x3xf32>)
outs(%init: tensor<1x96x56x56xf32>) -> tensor<1x96x56x56xf32>		outs(%init: tensor<1x96x56x56xf32>)
return %0: tensor<1x96x56x56xf32>		return %0: tensor<1x96x56x56xf32>
}		}

// CHECK-LABEL: func @depthwise_conv_2d_nchw_chw_memref		// CHECK-LABEL: func @depthwise_conv_2d_nchw_chw_memref
func.func @depthwise_conv_2d_nchw_chw_memref(%input: memref<1x96x113x113xf32>, %filter: memref<96x3x3xf32>, %output: memref<1x96x56x56xf32>) {		func.func @depthwise_conv_2d_nchw_chw_memref(%input: memref<1x96x113x113xf32>, %filter: memref<96x3x3xf32>, %output: memref<1x96x56x56xf32>) {
// CHECK: linalg.depthwise_conv_2d_nchw_chw		// CHECK: linalg.depthwise_conv_2d_nchw_chw
// CHECK-SAME: {dilations = dense<1> : vector<2xi64>, strides = dense<2> : vector<2xi64>}		// CHECK-SAME: {dilations = dense<1> : vector<2xi64>, strides = dense<2> : vector<2xi64>}
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : memref<1x96x113x113xf32>, memref<96x3x3xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : memref<1x96x113x113xf32>, memref<96x3x3xf32>)
// CHECK-SAME: outs(%{{.+}} : memref<1x96x56x56xf32>)		// CHECK-SAME: outs(%{{.+}} : memref<1x96x56x56xf32>)
linalg.depthwise_conv_2d_nchw_chw {dilations = dense<1> : vector<2xi64>, strides = dense<2> : vector<2xi64>}		linalg.depthwise_conv_2d_nchw_chw {dilations = dense<1> : vector<2xi64>, strides = dense<2> : vector<2xi64>}
ins(%input, %filter: memref<1x96x113x113xf32>, memref<96x3x3xf32>)		ins(%input, %filter: memref<1x96x113x113xf32>, memref<96x3x3xf32>)
outs(%output: memref<1x96x56x56xf32>)		outs(%output: memref<1x96x56x56xf32>)
return		return
}		}

func.func @depthwise_conv_2d_nhwc_hwcm_tensor_dilated(%input: tensor<2x8x9x2xf32>, %filter: tensor<2x2x2x3xf32>) -> tensor<2x6x7x2x3xf32> {		func.func @depthwise_conv_2d_nhwc_hwcm_tensor_dilated(%input: tensor<2x8x9x2xf32>, %filter: tensor<2x2x2x3xf32>) -> tensor<2x6x7x2x3xf32> {
%zero = arith.constant 0.000000e+00 : f32		%zero = arith.constant 0.000000e+00 : f32
%init = tensor.empty() : tensor<2x6x7x2x3xf32>		%init = tensor.empty() : tensor<2x6x7x2x3xf32>
%fill = linalg.fill ins(%zero : f32) outs(%init : tensor<2x6x7x2x3xf32>) -> tensor<2x6x7x2x3xf32>		%fill = linalg.fill ins(%zero : f32) outs(%init : tensor<2x6x7x2x3xf32>)
// CHECK: %{{.+}} = linalg.depthwise_conv_2d_nhwc_hwcm		// CHECK: %{{.+}} = linalg.depthwise_conv_2d_nhwc_hwcm
// CHECK-SAME: {dilations = dense<2> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>}		// CHECK-SAME: {dilations = dense<2> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>}
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<2x8x9x2xf32>, tensor<2x2x2x3xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<2x8x9x2xf32>, tensor<2x2x2x3xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<2x6x7x2x3xf32>)		// CHECK-SAME: outs(%{{.+}} : tensor<2x6x7x2x3xf32>)
%0 = linalg.depthwise_conv_2d_nhwc_hwcm		%0 = linalg.depthwise_conv_2d_nhwc_hwcm
{ dilations = dense<2> : tensor<2xi64>, strides = dense<1> : tensor<2xi64> }		{ dilations = dense<2> : tensor<2xi64>, strides = dense<1> : tensor<2xi64> }
ins(%input, %filter : tensor<2x8x9x2xf32>, tensor<2x2x2x3xf32>)		ins(%input, %filter : tensor<2x8x9x2xf32>, tensor<2x2x2x3xf32>)
outs(%fill : tensor<2x6x7x2x3xf32>) -> tensor<2x6x7x2x3xf32>		outs(%fill : tensor<2x6x7x2x3xf32>)
return %0 : tensor<2x6x7x2x3xf32>		return %0 : tensor<2x6x7x2x3xf32>
}		}

// CHECK-LABEL: func @depthwise_conv_2d_nhwc_hwcm_memref_dilated		// CHECK-LABEL: func @depthwise_conv_2d_nhwc_hwcm_memref_dilated
func.func @depthwise_conv_2d_nhwc_hwcm_memref_dilated(%input: memref<2x8x9x2xf32>, %filter: memref<2x2x2x3xf32>, %output: memref<2x6x7x2x3xf32>) {		func.func @depthwise_conv_2d_nhwc_hwcm_memref_dilated(%input: memref<2x8x9x2xf32>, %filter: memref<2x2x2x3xf32>, %output: memref<2x6x7x2x3xf32>) {
// CHECK: linalg.depthwise_conv_2d_nhwc_hwcm		// CHECK: linalg.depthwise_conv_2d_nhwc_hwcm
// CHECK-SAME: {dilations = dense<2> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>}		// CHECK-SAME: {dilations = dense<2> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>}
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : memref<2x8x9x2xf32>, memref<2x2x2x3xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : memref<2x8x9x2xf32>, memref<2x2x2x3xf32>)
Show All 39 Lines
}		}

// -----		// -----

// CHECK-LABEL: func @depthwise_conv_3d_ndhwc_dhwcm		// CHECK-LABEL: func @depthwise_conv_3d_ndhwc_dhwcm
func.func @depthwise_conv_3d_ndhwc_dhwcm(%input: tensor<2x6x13x12x6xf32>, %filter: tensor<2x1x3x6x6xf32>) -> tensor<2x3x13x4x6x6xf32> {		func.func @depthwise_conv_3d_ndhwc_dhwcm(%input: tensor<2x6x13x12x6xf32>, %filter: tensor<2x1x3x6x6xf32>) -> tensor<2x3x13x4x6x6xf32> {
%zero = arith.constant 0.000000e+00 : f32		%zero = arith.constant 0.000000e+00 : f32
%init = tensor.empty() : tensor<2x3x13x4x6x6xf32>		%init = tensor.empty() : tensor<2x3x13x4x6x6xf32>
%fill = linalg.fill ins(%zero : f32) outs(%init : tensor<2x3x13x4x6x6xf32>) -> tensor<2x3x13x4x6x6xf32>		%fill = linalg.fill ins(%zero : f32) outs(%init : tensor<2x3x13x4x6x6xf32>)
// CHECK: depthwise_conv_3d_ndhwc_dhwcm		// CHECK: depthwise_conv_3d_ndhwc_dhwcm
%0 = linalg.depthwise_conv_3d_ndhwc_dhwcm {dilations = dense<1> : tensor<3xi64>, strides = dense<[2, 1, 3]> : tensor<3xi64>}		%0 = linalg.depthwise_conv_3d_ndhwc_dhwcm {dilations = dense<1> : tensor<3xi64>, strides = dense<[2, 1, 3]> : tensor<3xi64>}
ins(%input, %filter : tensor<2x6x13x12x6xf32>, tensor<2x1x3x6x6xf32>)		ins(%input, %filter : tensor<2x6x13x12x6xf32>, tensor<2x1x3x6x6xf32>)
outs(%fill : tensor<2x3x13x4x6x6xf32>) -> tensor<2x3x13x4x6x6xf32>		outs(%fill : tensor<2x3x13x4x6x6xf32>)
return %0 : tensor<2x3x13x4x6x6xf32>		return %0 : tensor<2x3x13x4x6x6xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @depthwise_conv_3d_ndhwc_dhwc		// CHECK-LABEL: func @depthwise_conv_3d_ndhwc_dhwc
func.func @depthwise_conv_3d_ndhwc_dhwc(%input: tensor<2x6x13x12x6xf32>, %filter: tensor<2x1x3x6xf32>) -> tensor<2x3x13x4x6xf32> {		func.func @depthwise_conv_3d_ndhwc_dhwc(%input: tensor<2x6x13x12x6xf32>, %filter: tensor<2x1x3x6xf32>) -> tensor<2x3x13x4x6xf32> {
%zero = arith.constant 0.000000e+00 : f32		%zero = arith.constant 0.000000e+00 : f32
%init = tensor.empty() : tensor<2x3x13x4x6xf32>		%init = tensor.empty() : tensor<2x3x13x4x6xf32>
%fill = linalg.fill ins(%zero : f32) outs(%init : tensor<2x3x13x4x6xf32>) -> tensor<2x3x13x4x6xf32>		%fill = linalg.fill ins(%zero : f32) outs(%init : tensor<2x3x13x4x6xf32>)
// CHECK: depthwise_conv_3d_ndhwc_dhwc		// CHECK: depthwise_conv_3d_ndhwc_dhwc
%0 = linalg.depthwise_conv_3d_ndhwc_dhwc {dilations = dense<1> : tensor<3xi64>, strides = dense<[2, 1, 3]> : tensor<3xi64>}		%0 = linalg.depthwise_conv_3d_ndhwc_dhwc {dilations = dense<1> : tensor<3xi64>, strides = dense<[2, 1, 3]> : tensor<3xi64>}
ins(%input, %filter : tensor<2x6x13x12x6xf32>, tensor<2x1x3x6xf32>)		ins(%input, %filter : tensor<2x6x13x12x6xf32>, tensor<2x1x3x6xf32>)
outs(%fill : tensor<2x3x13x4x6xf32>) -> tensor<2x3x13x4x6xf32>		outs(%fill : tensor<2x3x13x4x6xf32>)
return %0 : tensor<2x3x13x4x6xf32>		return %0 : tensor<2x3x13x4x6xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @conv_1d_nwc_wcf		// CHECK-LABEL: func @conv_1d_nwc_wcf
func.func @conv_1d_nwc_wcf(%input: tensor<?x?x?xf32>, %filter: tensor<?x?x?xf32>, %init: tensor<?x?x?xf32>) -> tensor<?x?x?xf32> {		func.func @conv_1d_nwc_wcf(%input: tensor<?x?x?xf32>, %filter: tensor<?x?x?xf32>, %init: tensor<?x?x?xf32>) -> tensor<?x?x?xf32> {
// CHECK: %{{.+}} = linalg.conv_1d_nwc_wcf		// CHECK: %{{.+}} = linalg.conv_1d_nwc_wcf
// CHECK-SAME: dilations = dense<1> : tensor<1xi64>		// CHECK-SAME: dilations = dense<1> : tensor<1xi64>
// CHECK-SAME: strides = dense<1> : tensor<1xi64>		// CHECK-SAME: strides = dense<1> : tensor<1xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<?x?x?xf32>, tensor<?x?x?xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<?x?x?xf32>, tensor<?x?x?xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<?x?x?xf32>) -> tensor<?x?x?xf32>		// CHECK-SAME: outs(%{{.+}} : tensor<?x?x?xf32>)
%0 = linalg.conv_1d_nwc_wcf {dilations = dense<1> : tensor<1xi64>,		%0 = linalg.conv_1d_nwc_wcf {dilations = dense<1> : tensor<1xi64>,
strides = dense<1> : tensor<1xi64>}		strides = dense<1> : tensor<1xi64>}
ins (%input, %filter: tensor<?x?x?xf32>, tensor<?x?x?xf32>)		ins (%input, %filter: tensor<?x?x?xf32>, tensor<?x?x?xf32>)
outs (%init: tensor<?x?x?xf32>) -> tensor<?x?x?xf32>		outs (%init: tensor<?x?x?xf32>)
return %0 : tensor<?x?x?xf32>		return %0 : tensor<?x?x?xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @conv_1d_nwc_wcf		// CHECK-LABEL: func @conv_1d_nwc_wcf
func.func @conv_1d_nwc_wcf(%input: memref<?x?x?xf32>, %filter: memref<?x?x?xf32>, %output: memref<?x?x?xf32>) {		func.func @conv_1d_nwc_wcf(%input: memref<?x?x?xf32>, %filter: memref<?x?x?xf32>, %output: memref<?x?x?xf32>) {
// CHECK: linalg.conv_1d_nwc_wcf		// CHECK: linalg.conv_1d_nwc_wcf
Show All 11 Lines
// -----		// -----

// CHECK-LABEL: func @conv_1d_ncw_fcw		// CHECK-LABEL: func @conv_1d_ncw_fcw
func.func @conv_1d_ncw_fcw(%input: tensor<?x?x?xf32>, %filter: tensor<?x?x?xf32>, %init: tensor<?x?x?xf32>) -> tensor<?x?x?xf32> {		func.func @conv_1d_ncw_fcw(%input: tensor<?x?x?xf32>, %filter: tensor<?x?x?xf32>, %init: tensor<?x?x?xf32>) -> tensor<?x?x?xf32> {
// CHECK: %{{.+}} = linalg.conv_1d_ncw_fcw		// CHECK: %{{.+}} = linalg.conv_1d_ncw_fcw
// CHECK-SAME: dilations = dense<1> : tensor<1xi64>		// CHECK-SAME: dilations = dense<1> : tensor<1xi64>
// CHECK-SAME: strides = dense<1> : tensor<1xi64>		// CHECK-SAME: strides = dense<1> : tensor<1xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<?x?x?xf32>, tensor<?x?x?xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<?x?x?xf32>, tensor<?x?x?xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<?x?x?xf32>) -> tensor<?x?x?xf32>		// CHECK-SAME: outs(%{{.+}} : tensor<?x?x?xf32>)
%0 = linalg.conv_1d_ncw_fcw {dilations = dense<1> : tensor<1xi64>,		%0 = linalg.conv_1d_ncw_fcw {dilations = dense<1> : tensor<1xi64>,
strides = dense<1> : tensor<1xi64>}		strides = dense<1> : tensor<1xi64>}
ins (%input, %filter: tensor<?x?x?xf32>, tensor<?x?x?xf32>)		ins (%input, %filter: tensor<?x?x?xf32>, tensor<?x?x?xf32>)
outs (%init: tensor<?x?x?xf32>) -> tensor<?x?x?xf32>		outs (%init: tensor<?x?x?xf32>)
return %0 : tensor<?x?x?xf32>		return %0 : tensor<?x?x?xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @conv_1d_ncw_fcw		// CHECK-LABEL: func @conv_1d_ncw_fcw
func.func @conv_1d_ncw_fcw(%input: memref<?x?x?xf32>, %filter: memref<?x?x?xf32>, %output: memref<?x?x?xf32>) {		func.func @conv_1d_ncw_fcw(%input: memref<?x?x?xf32>, %filter: memref<?x?x?xf32>, %output: memref<?x?x?xf32>) {
// CHECK: linalg.conv_1d_ncw_fcw		// CHECK: linalg.conv_1d_ncw_fcw
Show All 11 Lines
// -----		// -----

// CHECK-LABEL: func @conv_2d_nhwc_hwcf		// CHECK-LABEL: func @conv_2d_nhwc_hwcf
func.func @conv_2d_nhwc_hwcf(%input: tensor<?x?x?x?xf32>, %filter: tensor<?x?x?x?xf32>, %init: tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32> {		func.func @conv_2d_nhwc_hwcf(%input: tensor<?x?x?x?xf32>, %filter: tensor<?x?x?x?xf32>, %init: tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32> {
// CHECK: %{{.+}} = linalg.conv_2d_nhwc_hwcf		// CHECK: %{{.+}} = linalg.conv_2d_nhwc_hwcf
// CHECK-SAME: dilations = dense<1> : tensor<2xi64>		// CHECK-SAME: dilations = dense<1> : tensor<2xi64>
// CHECK-SAME: strides = dense<1> : tensor<2xi64>		// CHECK-SAME: strides = dense<1> : tensor<2xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<?x?x?x?xf32>, tensor<?x?x?x?xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<?x?x?x?xf32>, tensor<?x?x?x?xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32>		// CHECK-SAME: outs(%{{.+}} : tensor<?x?x?x?xf32>)
%0 = linalg.conv_2d_nhwc_hwcf {dilations = dense<1> : tensor<2xi64>,		%0 = linalg.conv_2d_nhwc_hwcf {dilations = dense<1> : tensor<2xi64>,
strides = dense<1> : tensor<2xi64>}		strides = dense<1> : tensor<2xi64>}
ins (%input, %filter: tensor<?x?x?x?xf32>, tensor<?x?x?x?xf32>)		ins (%input, %filter: tensor<?x?x?x?xf32>, tensor<?x?x?x?xf32>)
outs (%init: tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32>		outs (%init: tensor<?x?x?x?xf32>)
return %0 : tensor<?x?x?x?xf32>		return %0 : tensor<?x?x?x?xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @conv_2d_ngchw_fgchw		// CHECK-LABEL: func @conv_2d_ngchw_fgchw
func.func @conv_2d_ngchw_fgchw(%input: tensor<?x?x?x?x?xf32>, %filter: tensor<?x?x?x?x?xf32>, %init: tensor<?x?x?x?x?xf32>) -> tensor<?x?x?x?x?xf32> {		func.func @conv_2d_ngchw_fgchw(%input: tensor<?x?x?x?x?xf32>, %filter: tensor<?x?x?x?x?xf32>, %init: tensor<?x?x?x?x?xf32>) -> tensor<?x?x?x?x?xf32> {
// CHECK: %{{.+}} = linalg.conv_2d_ngchw_fgchw		// CHECK: %{{.+}} = linalg.conv_2d_ngchw_fgchw
// CHECK-SAME: dilations = dense<1> : tensor<2xi64>		// CHECK-SAME: dilations = dense<1> : tensor<2xi64>
// CHECK-SAME: strides = dense<1> : tensor<2xi64>		// CHECK-SAME: strides = dense<1> : tensor<2xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<?x?x?x?x?xf32>, tensor<?x?x?x?x?xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<?x?x?x?x?xf32>, tensor<?x?x?x?x?xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<?x?x?x?x?xf32>) -> tensor<?x?x?x?x?xf32>		// CHECK-SAME: outs(%{{.+}} : tensor<?x?x?x?x?xf32>)
%0 = linalg.conv_2d_ngchw_fgchw {dilations = dense<1> : tensor<2xi64>,		%0 = linalg.conv_2d_ngchw_fgchw {dilations = dense<1> : tensor<2xi64>,
strides = dense<1> : tensor<2xi64>}		strides = dense<1> : tensor<2xi64>}
ins (%input, %filter: tensor<?x?x?x?x?xf32>, tensor<?x?x?x?x?xf32>)		ins (%input, %filter: tensor<?x?x?x?x?xf32>, tensor<?x?x?x?x?xf32>)
outs (%init: tensor<?x?x?x?x?xf32>) -> tensor<?x?x?x?x?xf32>		outs (%init: tensor<?x?x?x?x?xf32>)
return %0 : tensor<?x?x?x?x?xf32>		return %0 : tensor<?x?x?x?x?xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @conv_2d_nhwc_fhwc		// CHECK-LABEL: func @conv_2d_nhwc_fhwc
func.func @conv_2d_nhwc_fhwc(%input: tensor<?x?x?x?xf32>, %filter: tensor<?x?x?x?xf32>, %init: tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32> {		func.func @conv_2d_nhwc_fhwc(%input: tensor<?x?x?x?xf32>, %filter: tensor<?x?x?x?xf32>, %init: tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32> {
// CHECK: %{{.+}} = linalg.conv_2d_nhwc_fhwc		// CHECK: %{{.+}} = linalg.conv_2d_nhwc_fhwc
// CHECK-SAME: dilations = dense<1> : tensor<2xi64>		// CHECK-SAME: dilations = dense<1> : tensor<2xi64>
// CHECK-SAME: strides = dense<1> : tensor<2xi64>		// CHECK-SAME: strides = dense<1> : tensor<2xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<?x?x?x?xf32>, tensor<?x?x?x?xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<?x?x?x?xf32>, tensor<?x?x?x?xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32>		// CHECK-SAME: outs(%{{.+}} : tensor<?x?x?x?xf32>)
%0 = linalg.conv_2d_nhwc_fhwc {dilations = dense<1> : tensor<2xi64>,		%0 = linalg.conv_2d_nhwc_fhwc {dilations = dense<1> : tensor<2xi64>,
strides = dense<1> : tensor<2xi64>}		strides = dense<1> : tensor<2xi64>}
ins (%input, %filter: tensor<?x?x?x?xf32>, tensor<?x?x?x?xf32>)		ins (%input, %filter: tensor<?x?x?x?xf32>, tensor<?x?x?x?xf32>)
outs (%init: tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32>		outs (%init: tensor<?x?x?x?xf32>)
return %0 : tensor<?x?x?x?xf32>		return %0 : tensor<?x?x?x?xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @conv_2d_nhwc_fhwc_static		// CHECK-LABEL: func @conv_2d_nhwc_fhwc_static
func.func @conv_2d_nhwc_fhwc_static(%input: tensor<?x128x128x32xf32>, %filter: tensor<64x3x3x32xf32>, %init: tensor<?x126x126x64xf32>) -> tensor<?x126x126x64xf32> {		func.func @conv_2d_nhwc_fhwc_static(%input: tensor<?x128x128x32xf32>, %filter: tensor<64x3x3x32xf32>, %init: tensor<?x126x126x64xf32>) -> tensor<?x126x126x64xf32> {
// CHECK: %{{.+}} = linalg.conv_2d_nhwc_fhwc		// CHECK: %{{.+}} = linalg.conv_2d_nhwc_fhwc
// CHECK-SAME: dilations = dense<1> : tensor<2xi64>		// CHECK-SAME: dilations = dense<1> : tensor<2xi64>
// CHECK-SAME: strides = dense<1> : tensor<2xi64>		// CHECK-SAME: strides = dense<1> : tensor<2xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<?x128x128x32xf32>, tensor<64x3x3x32xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<?x128x128x32xf32>, tensor<64x3x3x32xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<?x126x126x64xf32>) -> tensor<?x126x126x64xf32>		// CHECK-SAME: outs(%{{.+}} : tensor<?x126x126x64xf32>)
%0 = linalg.conv_2d_nhwc_fhwc {dilations = dense<1> : tensor<2xi64>,		%0 = linalg.conv_2d_nhwc_fhwc {dilations = dense<1> : tensor<2xi64>,
strides = dense<1> : tensor<2xi64>}		strides = dense<1> : tensor<2xi64>}
ins (%input, %filter: tensor<?x128x128x32xf32>, tensor<64x3x3x32xf32>)		ins (%input, %filter: tensor<?x128x128x32xf32>, tensor<64x3x3x32xf32>)
outs (%init: tensor<?x126x126x64xf32>) -> tensor<?x126x126x64xf32>		outs (%init: tensor<?x126x126x64xf32>)
return %0 : tensor<?x126x126x64xf32>		return %0 : tensor<?x126x126x64xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @conv_2d_nhwc_hwcf		// CHECK-LABEL: func @conv_2d_nhwc_hwcf
func.func @conv_2d_nhwc_hwcf(%input: memref<?x?x?x?xf32>, %filter: memref<?x?x?x?xf32>, %output: memref<?x?x?x?xf32>) {		func.func @conv_2d_nhwc_hwcf(%input: memref<?x?x?x?xf32>, %filter: memref<?x?x?x?xf32>, %output: memref<?x?x?x?xf32>) {
// CHECK: linalg.conv_2d_nhwc_hwcf		// CHECK: linalg.conv_2d_nhwc_hwcf
Show All 27 Lines
// -----		// -----

// CHECK-LABEL: func @conv_3d_ndhwc_dhwcf		// CHECK-LABEL: func @conv_3d_ndhwc_dhwcf
func.func @conv_3d_ndhwc_dhwcf(%input: tensor<?x?x?x?x?xf32>, %filter: tensor<?x?x?x?x?xf32>, %init: tensor<?x?x?x?x?xf32>) -> tensor<?x?x?x?x?xf32> {		func.func @conv_3d_ndhwc_dhwcf(%input: tensor<?x?x?x?x?xf32>, %filter: tensor<?x?x?x?x?xf32>, %init: tensor<?x?x?x?x?xf32>) -> tensor<?x?x?x?x?xf32> {
// CHECK: %{{.+}} = linalg.conv_3d_ndhwc_dhwcf		// CHECK: %{{.+}} = linalg.conv_3d_ndhwc_dhwcf
// CHECK-SAME: dilations = dense<1> : tensor<3xi64>		// CHECK-SAME: dilations = dense<1> : tensor<3xi64>
// CHECK-SAME: strides = dense<1> : tensor<3xi64>		// CHECK-SAME: strides = dense<1> : tensor<3xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<?x?x?x?x?xf32>, tensor<?x?x?x?x?xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<?x?x?x?x?xf32>, tensor<?x?x?x?x?xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<?x?x?x?x?xf32>) -> tensor<?x?x?x?x?xf32>		// CHECK-SAME: outs(%{{.+}} : tensor<?x?x?x?x?xf32>)
%0 = linalg.conv_3d_ndhwc_dhwcf {dilations = dense<1> : tensor<3xi64>,		%0 = linalg.conv_3d_ndhwc_dhwcf {dilations = dense<1> : tensor<3xi64>,
strides = dense<1> : tensor<3xi64>}		strides = dense<1> : tensor<3xi64>}
ins (%input, %filter: tensor<?x?x?x?x?xf32>, tensor<?x?x?x?x?xf32>)		ins (%input, %filter: tensor<?x?x?x?x?xf32>, tensor<?x?x?x?x?xf32>)
outs (%init: tensor<?x?x?x?x?xf32>) -> tensor<?x?x?x?x?xf32>		outs (%init: tensor<?x?x?x?x?xf32>)
return %0 : tensor<?x?x?x?x?xf32>		return %0 : tensor<?x?x?x?x?xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @conv_3d_ndhwc_dhwcf		// CHECK-LABEL: func @conv_3d_ndhwc_dhwcf
func.func @conv_3d_ndhwc_dhwcf(%input: memref<?x?x?x?x?xf32>, %filter: memref<?x?x?x?x?xf32>, %output: memref<?x?x?x?x?xf32>) {		func.func @conv_3d_ndhwc_dhwcf(%input: memref<?x?x?x?x?xf32>, %filter: memref<?x?x?x?x?xf32>, %output: memref<?x?x?x?x?xf32>) {
// CHECK: linalg.conv_3d_ndhwc_dhwcf		// CHECK: linalg.conv_3d_ndhwc_dhwcf
Show All 10 Lines

// -----		// -----

// CHECK-LABEL: func @pooling_nhwc_sum_tensor		// CHECK-LABEL: func @pooling_nhwc_sum_tensor
// CHECK: %{{.+}} = linalg.pooling_nhwc_sum		// CHECK: %{{.+}} = linalg.pooling_nhwc_sum
// CHECK-SAME: dilations = dense<1> : tensor<2xi64>		// CHECK-SAME: dilations = dense<1> : tensor<2xi64>
// CHECK-SAME: strides = dense<1> : tensor<2xi64>		// CHECK-SAME: strides = dense<1> : tensor<2xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x4x1xf32>, tensor<3x3xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x4x1xf32>, tensor<3x3xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<1x2x2x1xf32>) -> tensor<1x2x2x1xf32>		// CHECK-SAME: outs(%{{.+}} : tensor<1x2x2x1xf32>)
func.func @pooling_nhwc_sum_tensor(%input: tensor<1x4x4x1xf32>) -> tensor<1x2x2x1xf32> {		func.func @pooling_nhwc_sum_tensor(%input: tensor<1x4x4x1xf32>) -> tensor<1x2x2x1xf32> {
%fake = tensor.empty() : tensor<3x3xf32>		%fake = tensor.empty() : tensor<3x3xf32>
%init = tensor.empty() : tensor<1x2x2x1xf32>		%init = tensor.empty() : tensor<1x2x2x1xf32>
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x2x2x1xf32>) -> tensor<1x2x2x1xf32>		%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x2x2x1xf32>)
%res = linalg.pooling_nhwc_sum {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>}		%res = linalg.pooling_nhwc_sum {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>}
ins(%input, %fake: tensor<1x4x4x1xf32>, tensor<3x3xf32>)		ins(%input, %fake: tensor<1x4x4x1xf32>, tensor<3x3xf32>)
outs(%fill: tensor<1x2x2x1xf32>) -> tensor<1x2x2x1xf32>		outs(%fill: tensor<1x2x2x1xf32>)
return %res : tensor<1x2x2x1xf32>		return %res : tensor<1x2x2x1xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @pooling_nwc_sum_tensor		// CHECK-LABEL: func @pooling_nwc_sum_tensor
// CHECK: %{{.+}} = linalg.pooling_nwc_sum		// CHECK: %{{.+}} = linalg.pooling_nwc_sum
// CHECK-SAME: dilations = dense<1> : tensor<1xi64>		// CHECK-SAME: dilations = dense<1> : tensor<1xi64>
// CHECK-SAME: strides = dense<1> : tensor<1xi64>		// CHECK-SAME: strides = dense<1> : tensor<1xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x1xf32>, tensor<3xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x1xf32>, tensor<3xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<1x2x1xf32>) -> tensor<1x2x1xf32>		// CHECK-SAME: outs(%{{.+}} : tensor<1x2x1xf32>)
func.func @pooling_nwc_sum_tensor(%input: tensor<1x4x1xf32>) -> tensor<1x2x1xf32> {		func.func @pooling_nwc_sum_tensor(%input: tensor<1x4x1xf32>) -> tensor<1x2x1xf32> {
%fake = tensor.empty() : tensor<3xf32>		%fake = tensor.empty() : tensor<3xf32>
%init = tensor.empty() : tensor<1x2x1xf32>		%init = tensor.empty() : tensor<1x2x1xf32>
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x2x1xf32>) -> tensor<1x2x1xf32>		%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x2x1xf32>)
%res = linalg.pooling_nwc_sum {dilations = dense<1> : tensor<1xi64>, strides = dense<1> : tensor<1xi64>}		%res = linalg.pooling_nwc_sum {dilations = dense<1> : tensor<1xi64>, strides = dense<1> : tensor<1xi64>}
ins(%input, %fake: tensor<1x4x1xf32>, tensor<3xf32>)		ins(%input, %fake: tensor<1x4x1xf32>, tensor<3xf32>)
outs(%fill: tensor<1x2x1xf32>) -> tensor<1x2x1xf32>		outs(%fill: tensor<1x2x1xf32>)
return %res : tensor<1x2x1xf32>		return %res : tensor<1x2x1xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @pooling_nhwc_sum		// CHECK-LABEL: func @pooling_nhwc_sum
// CHECK: linalg.pooling_nhwc_sum		// CHECK: linalg.pooling_nhwc_sum
// CHECK-SAME: dilations = dense<1> : tensor<2xi64>		// CHECK-SAME: dilations = dense<1> : tensor<2xi64>
Show All 24 Lines

// -----		// -----

// CHECK-LABEL: func @pooling_nchw_sum_tensor		// CHECK-LABEL: func @pooling_nchw_sum_tensor
// CHECK: %{{.+}} = linalg.pooling_nchw_sum		// CHECK: %{{.+}} = linalg.pooling_nchw_sum
// CHECK-SAME: dilations = dense<1> : tensor<2xi64>		// CHECK-SAME: dilations = dense<1> : tensor<2xi64>
// CHECK-SAME: strides = dense<1> : tensor<2xi64>		// CHECK-SAME: strides = dense<1> : tensor<2xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x1x4x4xf32>, tensor<3x3xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x1x4x4xf32>, tensor<3x3xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<1x1x2x2xf32>) -> tensor<1x1x2x2xf32>		// CHECK-SAME: outs(%{{.+}} : tensor<1x1x2x2xf32>)
func.func @pooling_nchw_sum_tensor(%input: tensor<1x1x4x4xf32>) -> tensor<1x1x2x2xf32> {		func.func @pooling_nchw_sum_tensor(%input: tensor<1x1x4x4xf32>) -> tensor<1x1x2x2xf32> {
%fake = tensor.empty() : tensor<3x3xf32>		%fake = tensor.empty() : tensor<3x3xf32>
%init = tensor.empty() : tensor<1x1x2x2xf32>		%init = tensor.empty() : tensor<1x1x2x2xf32>
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x1x2x2xf32>) -> tensor<1x1x2x2xf32>		%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x1x2x2xf32>)
%res = linalg.pooling_nchw_sum {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>}		%res = linalg.pooling_nchw_sum {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>}
ins(%input, %fake: tensor<1x1x4x4xf32>, tensor<3x3xf32>)		ins(%input, %fake: tensor<1x1x4x4xf32>, tensor<3x3xf32>)
outs(%fill: tensor<1x1x2x2xf32>) -> tensor<1x1x2x2xf32>		outs(%fill: tensor<1x1x2x2xf32>)
return %res : tensor<1x1x2x2xf32>		return %res : tensor<1x1x2x2xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @pooling_ncw_sum_tensor		// CHECK-LABEL: func @pooling_ncw_sum_tensor
// CHECK: %{{.+}} = linalg.pooling_ncw_sum		// CHECK: %{{.+}} = linalg.pooling_ncw_sum
// CHECK-SAME: dilations = dense<1> : tensor<1xi64>		// CHECK-SAME: dilations = dense<1> : tensor<1xi64>
// CHECK-SAME: strides = dense<1> : tensor<1xi64>		// CHECK-SAME: strides = dense<1> : tensor<1xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x1x4xf32>, tensor<3xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x1x4xf32>, tensor<3xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<1x1x2xf32>) -> tensor<1x1x2xf32>		// CHECK-SAME: outs(%{{.+}} : tensor<1x1x2xf32>)
func.func @pooling_ncw_sum_tensor(%input: tensor<1x1x4xf32>) -> tensor<1x1x2xf32> {		func.func @pooling_ncw_sum_tensor(%input: tensor<1x1x4xf32>) -> tensor<1x1x2xf32> {
%fake = tensor.empty() : tensor<3xf32>		%fake = tensor.empty() : tensor<3xf32>
%init = tensor.empty() : tensor<1x1x2xf32>		%init = tensor.empty() : tensor<1x1x2xf32>
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x1x2xf32>) -> tensor<1x1x2xf32>		%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x1x2xf32>)
%res = linalg.pooling_ncw_sum {dilations = dense<1> : tensor<1xi64>, strides = dense<1> : tensor<1xi64>}		%res = linalg.pooling_ncw_sum {dilations = dense<1> : tensor<1xi64>, strides = dense<1> : tensor<1xi64>}
ins(%input, %fake: tensor<1x1x4xf32>, tensor<3xf32>)		ins(%input, %fake: tensor<1x1x4xf32>, tensor<3xf32>)
outs(%fill: tensor<1x1x2xf32>) -> tensor<1x1x2xf32>		outs(%fill: tensor<1x1x2xf32>)
return %res : tensor<1x1x2xf32>		return %res : tensor<1x1x2xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @pooling_nchw_sum		// CHECK-LABEL: func @pooling_nchw_sum
// CHECK: linalg.pooling_nchw_sum		// CHECK: linalg.pooling_nchw_sum
// CHECK-SAME: dilations = dense<1> : tensor<2xi64>		// CHECK-SAME: dilations = dense<1> : tensor<2xi64>
Show All 24 Lines

// -----		// -----

// CHECK-LABEL: func @pooling_nhwc_max_tensor		// CHECK-LABEL: func @pooling_nhwc_max_tensor
// CHECK: %{{.+}} = linalg.pooling_nhwc_max		// CHECK: %{{.+}} = linalg.pooling_nhwc_max
// CHECK-SAME: dilations = dense<1> : tensor<2xi64>		// CHECK-SAME: dilations = dense<1> : tensor<2xi64>
// CHECK-SAME: strides = dense<1> : tensor<2xi64>		// CHECK-SAME: strides = dense<1> : tensor<2xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x4x1xf32>, tensor<3x3xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x4x1xf32>, tensor<3x3xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<1x2x2x1xf32>) -> tensor<1x2x2x1xf32>		// CHECK-SAME: outs(%{{.+}} : tensor<1x2x2x1xf32>)
func.func @pooling_nhwc_max_tensor(%input: tensor<1x4x4x1xf32>) -> tensor<1x2x2x1xf32> {		func.func @pooling_nhwc_max_tensor(%input: tensor<1x4x4x1xf32>) -> tensor<1x2x2x1xf32> {
%fake = tensor.empty() : tensor<3x3xf32>		%fake = tensor.empty() : tensor<3x3xf32>
%init = tensor.empty() : tensor<1x2x2x1xf32>		%init = tensor.empty() : tensor<1x2x2x1xf32>
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x2x2x1xf32>) -> tensor<1x2x2x1xf32>		%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x2x2x1xf32>)
%res = linalg.pooling_nhwc_max {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>}		%res = linalg.pooling_nhwc_max {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>}
ins(%input, %fake: tensor<1x4x4x1xf32>, tensor<3x3xf32>)		ins(%input, %fake: tensor<1x4x4x1xf32>, tensor<3x3xf32>)
outs(%fill: tensor<1x2x2x1xf32>) -> tensor<1x2x2x1xf32>		outs(%fill: tensor<1x2x2x1xf32>)
return %res : tensor<1x2x2x1xf32>		return %res : tensor<1x2x2x1xf32>
}		}

// -----		// -----
// CHECK-LABEL: func @pooling_nwc_max_tensor		// CHECK-LABEL: func @pooling_nwc_max_tensor
// CHECK: %{{.+}} = linalg.pooling_nwc_max		// CHECK: %{{.+}} = linalg.pooling_nwc_max
// CHECK-SAME: dilations = dense<1> : tensor<1xi64>		// CHECK-SAME: dilations = dense<1> : tensor<1xi64>
// CHECK-SAME: strides = dense<1> : tensor<1xi64>		// CHECK-SAME: strides = dense<1> : tensor<1xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x1xf32>, tensor<3xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x1xf32>, tensor<3xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<1x2x1xf32>) -> tensor<1x2x1xf32>		// CHECK-SAME: outs(%{{.+}} : tensor<1x2x1xf32>)
func.func @pooling_nwc_max_tensor(%input: tensor<1x4x1xf32>) -> tensor<1x2x1xf32> {		func.func @pooling_nwc_max_tensor(%input: tensor<1x4x1xf32>) -> tensor<1x2x1xf32> {
%fake = tensor.empty() : tensor<3xf32>		%fake = tensor.empty() : tensor<3xf32>
%init = tensor.empty() : tensor<1x2x1xf32>		%init = tensor.empty() : tensor<1x2x1xf32>
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x2x1xf32>) -> tensor<1x2x1xf32>		%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x2x1xf32>)
%res = linalg.pooling_nwc_max {dilations = dense<1> : tensor<1xi64>, strides = dense<1> : tensor<1xi64>}		%res = linalg.pooling_nwc_max {dilations = dense<1> : tensor<1xi64>, strides = dense<1> : tensor<1xi64>}
ins(%input, %fake: tensor<1x4x1xf32>, tensor<3xf32>)		ins(%input, %fake: tensor<1x4x1xf32>, tensor<3xf32>)
outs(%fill: tensor<1x2x1xf32>) -> tensor<1x2x1xf32>		outs(%fill: tensor<1x2x1xf32>)
return %res : tensor<1x2x1xf32>		return %res : tensor<1x2x1xf32>
}		}

// -----		// -----
// CHECK-LABEL: func @pooling_nchw_max_tensor		// CHECK-LABEL: func @pooling_nchw_max_tensor
// CHECK: %{{.+}} = linalg.pooling_nchw_max		// CHECK: %{{.+}} = linalg.pooling_nchw_max
// CHECK-SAME: dilations = dense<1> : tensor<2xi64>		// CHECK-SAME: dilations = dense<1> : tensor<2xi64>
// CHECK-SAME: strides = dense<1> : tensor<2xi64>		// CHECK-SAME: strides = dense<1> : tensor<2xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x1x4x4xf32>, tensor<3x3xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x1x4x4xf32>, tensor<3x3xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<1x1x2x2xf32>) -> tensor<1x1x2x2xf32>		// CHECK-SAME: outs(%{{.+}} : tensor<1x1x2x2xf32>)

func.func @pooling_nchw_max_tensor(%input: tensor<1x1x4x4xf32>) -> tensor<1x1x2x2xf32> {		func.func @pooling_nchw_max_tensor(%input: tensor<1x1x4x4xf32>) -> tensor<1x1x2x2xf32> {
%fake = tensor.empty() : tensor<3x3xf32>		%fake = tensor.empty() : tensor<3x3xf32>
%init = tensor.empty() : tensor<1x1x2x2xf32>		%init = tensor.empty() : tensor<1x1x2x2xf32>
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x1x2x2xf32>) -> tensor<1x1x2x2xf32>		%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x1x2x2xf32>)
%res = linalg.pooling_nchw_max {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>}		%res = linalg.pooling_nchw_max {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>}
ins(%input, %fake: tensor<1x1x4x4xf32>, tensor<3x3xf32>)		ins(%input, %fake: tensor<1x1x4x4xf32>, tensor<3x3xf32>)
outs(%fill: tensor<1x1x2x2xf32>) -> tensor<1x1x2x2xf32>		outs(%fill: tensor<1x1x2x2xf32>)
return %res : tensor<1x1x2x2xf32>		return %res : tensor<1x1x2x2xf32>
}		}

// -----		// -----
// CHECK-LABEL: func @pooling_ncw_max_tensor		// CHECK-LABEL: func @pooling_ncw_max_tensor
// CHECK: %{{.+}} = linalg.pooling_ncw_max		// CHECK: %{{.+}} = linalg.pooling_ncw_max
// CHECK-SAME: dilations = dense<1> : tensor<1xi64>		// CHECK-SAME: dilations = dense<1> : tensor<1xi64>
// CHECK-SAME: strides = dense<1> : tensor<1xi64>		// CHECK-SAME: strides = dense<1> : tensor<1xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x1x4xf32>, tensor<3xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x1x4xf32>, tensor<3xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<1x1x2xf32>) -> tensor<1x1x2xf32>		// CHECK-SAME: outs(%{{.+}} : tensor<1x1x2xf32>)

func.func @pooling_ncw_max_tensor(%input: tensor<1x1x4xf32>) -> tensor<1x1x2xf32> {		func.func @pooling_ncw_max_tensor(%input: tensor<1x1x4xf32>) -> tensor<1x1x2xf32> {
%fake = tensor.empty() : tensor<3xf32>		%fake = tensor.empty() : tensor<3xf32>
%init = tensor.empty() : tensor<1x1x2xf32>		%init = tensor.empty() : tensor<1x1x2xf32>
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x1x2xf32>) -> tensor<1x1x2xf32>		%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x1x2xf32>)
%res = linalg.pooling_ncw_max {dilations = dense<1> : tensor<1xi64>, strides = dense<1> : tensor<1xi64>}		%res = linalg.pooling_ncw_max {dilations = dense<1> : tensor<1xi64>, strides = dense<1> : tensor<1xi64>}
ins(%input, %fake: tensor<1x1x4xf32>, tensor<3xf32>)		ins(%input, %fake: tensor<1x1x4xf32>, tensor<3xf32>)
outs(%fill: tensor<1x1x2xf32>) -> tensor<1x1x2xf32>		outs(%fill: tensor<1x1x2xf32>)
return %res : tensor<1x1x2xf32>		return %res : tensor<1x1x2xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @pooling_nhwc_max		// CHECK-LABEL: func @pooling_nhwc_max
// CHECK: linalg.pooling_nhwc_max		// CHECK: linalg.pooling_nhwc_max
// CHECK-SAME: dilations = dense<1> : tensor<2xi64>		// CHECK-SAME: dilations = dense<1> : tensor<2xi64>
Show All 24 Lines

// -----		// -----

// CHECK-LABEL: func @pooling_nhwc_i8_max_tensor		// CHECK-LABEL: func @pooling_nhwc_i8_max_tensor
// CHECK: %{{.+}} = linalg.pooling_nhwc_max		// CHECK: %{{.+}} = linalg.pooling_nhwc_max
// CHECK-SAME: dilations = dense<1> : tensor<2xi64>		// CHECK-SAME: dilations = dense<1> : tensor<2xi64>
// CHECK-SAME: strides = dense<1> : tensor<2xi64>		// CHECK-SAME: strides = dense<1> : tensor<2xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x4x1xi8>, tensor<3x3xi8>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x4x1xi8>, tensor<3x3xi8>)
// CHECK-SAME: outs(%{{.+}} : tensor<1x2x2x1xi8>) -> tensor<1x2x2x1xi8>		// CHECK-SAME: outs(%{{.+}} : tensor<1x2x2x1xi8>)
func.func @pooling_nhwc_i8_max_tensor(%input: tensor<1x4x4x1xi8>) -> tensor<1x2x2x1xi8> {		func.func @pooling_nhwc_i8_max_tensor(%input: tensor<1x4x4x1xi8>) -> tensor<1x2x2x1xi8> {
%fake = tensor.empty() : tensor<3x3xi8>		%fake = tensor.empty() : tensor<3x3xi8>
%init = tensor.empty() : tensor<1x2x2x1xi8>		%init = tensor.empty() : tensor<1x2x2x1xi8>
%cst = arith.constant 0 : i8		%cst = arith.constant 0 : i8
%fill = linalg.fill ins(%cst : i8) outs(%init : tensor<1x2x2x1xi8>) -> tensor<1x2x2x1xi8>		%fill = linalg.fill ins(%cst : i8) outs(%init : tensor<1x2x2x1xi8>)
%res = linalg.pooling_nhwc_max {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>}		%res = linalg.pooling_nhwc_max {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>}
ins(%input, %fake: tensor<1x4x4x1xi8>, tensor<3x3xi8>)		ins(%input, %fake: tensor<1x4x4x1xi8>, tensor<3x3xi8>)
outs(%fill: tensor<1x2x2x1xi8>) -> tensor<1x2x2x1xi8>		outs(%fill: tensor<1x2x2x1xi8>)
return %res : tensor<1x2x2x1xi8>		return %res : tensor<1x2x2x1xi8>
}		}

// -----		// -----

// CHECK-LABEL: func @pooling_nwc_i8_max_tensor		// CHECK-LABEL: func @pooling_nwc_i8_max_tensor
// CHECK: %{{.+}} = linalg.pooling_nwc_max		// CHECK: %{{.+}} = linalg.pooling_nwc_max
// CHECK-SAME: dilations = dense<1> : tensor<1xi64>		// CHECK-SAME: dilations = dense<1> : tensor<1xi64>
// CHECK-SAME: strides = dense<1> : tensor<1xi64>		// CHECK-SAME: strides = dense<1> : tensor<1xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x1xi8>, tensor<3xi8>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x1xi8>, tensor<3xi8>)
// CHECK-SAME: outs(%{{.+}} : tensor<1x2x1xi8>) -> tensor<1x2x1xi8>		// CHECK-SAME: outs(%{{.+}} : tensor<1x2x1xi8>)
func.func @pooling_nwc_i8_max_tensor(%input: tensor<1x4x1xi8>) -> tensor<1x2x1xi8> {		func.func @pooling_nwc_i8_max_tensor(%input: tensor<1x4x1xi8>) -> tensor<1x2x1xi8> {
%fake = tensor.empty() : tensor<3xi8>		%fake = tensor.empty() : tensor<3xi8>
%init = tensor.empty() : tensor<1x2x1xi8>		%init = tensor.empty() : tensor<1x2x1xi8>
%cst = arith.constant 0 : i8		%cst = arith.constant 0 : i8
%fill = linalg.fill ins(%cst : i8) outs(%init : tensor<1x2x1xi8>) -> tensor<1x2x1xi8>		%fill = linalg.fill ins(%cst : i8) outs(%init : tensor<1x2x1xi8>)
%res = linalg.pooling_nwc_max {dilations = dense<1> : tensor<1xi64>, strides = dense<1> : tensor<1xi64>}		%res = linalg.pooling_nwc_max {dilations = dense<1> : tensor<1xi64>, strides = dense<1> : tensor<1xi64>}
ins(%input, %fake: tensor<1x4x1xi8>, tensor<3xi8>)		ins(%input, %fake: tensor<1x4x1xi8>, tensor<3xi8>)
outs(%fill: tensor<1x2x1xi8>) -> tensor<1x2x1xi8>		outs(%fill: tensor<1x2x1xi8>)
return %res : tensor<1x2x1xi8>		return %res : tensor<1x2x1xi8>
}		}

// -----		// -----

// CHECK-LABEL: func @pooling_nhwc_i8_max		// CHECK-LABEL: func @pooling_nhwc_i8_max
// CHECK: linalg.pooling_nhwc_max		// CHECK: linalg.pooling_nhwc_max
// CHECK-SAME: dilations = dense<1> : tensor<2xi64>		// CHECK-SAME: dilations = dense<1> : tensor<2xi64>
Show All 24 Lines

// -----		// -----

// CHECK-LABEL: func @pooling_nhwc_i16_max_tensor		// CHECK-LABEL: func @pooling_nhwc_i16_max_tensor
// CHECK: %{{.+}} = linalg.pooling_nhwc_max		// CHECK: %{{.+}} = linalg.pooling_nhwc_max
// CHECK-SAME: dilations = dense<1> : tensor<2xi64>		// CHECK-SAME: dilations = dense<1> : tensor<2xi64>
// CHECK-SAME: strides = dense<1> : tensor<2xi64>		// CHECK-SAME: strides = dense<1> : tensor<2xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x4x1xi16>, tensor<3x3xi16>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x4x1xi16>, tensor<3x3xi16>)
// CHECK-SAME: outs(%{{.+}} : tensor<1x2x2x1xi16>) -> tensor<1x2x2x1xi16>		// CHECK-SAME: outs(%{{.+}} : tensor<1x2x2x1xi16>)
func.func @pooling_nhwc_i16_max_tensor(%input: tensor<1x4x4x1xi16>) -> tensor<1x2x2x1xi16> {		func.func @pooling_nhwc_i16_max_tensor(%input: tensor<1x4x4x1xi16>) -> tensor<1x2x2x1xi16> {
%fake = tensor.empty() : tensor<3x3xi16>		%fake = tensor.empty() : tensor<3x3xi16>
%init = tensor.empty() : tensor<1x2x2x1xi16>		%init = tensor.empty() : tensor<1x2x2x1xi16>
%cst = arith.constant 0 : i16		%cst = arith.constant 0 : i16
%fill = linalg.fill ins(%cst : i16) outs(%init : tensor<1x2x2x1xi16>) -> tensor<1x2x2x1xi16>		%fill = linalg.fill ins(%cst : i16) outs(%init : tensor<1x2x2x1xi16>)
%res = linalg.pooling_nhwc_max {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>}		%res = linalg.pooling_nhwc_max {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>}
ins(%input, %fake: tensor<1x4x4x1xi16>, tensor<3x3xi16>)		ins(%input, %fake: tensor<1x4x4x1xi16>, tensor<3x3xi16>)
outs(%fill: tensor<1x2x2x1xi16>) -> tensor<1x2x2x1xi16>		outs(%fill: tensor<1x2x2x1xi16>)
return %res : tensor<1x2x2x1xi16>		return %res : tensor<1x2x2x1xi16>
}		}

// -----		// -----

// CHECK-LABEL: func @pooling_nwc_i16_max_tensor		// CHECK-LABEL: func @pooling_nwc_i16_max_tensor
// CHECK: %{{.+}} = linalg.pooling_nwc_max		// CHECK: %{{.+}} = linalg.pooling_nwc_max
// CHECK-SAME: dilations = dense<1> : tensor<1xi64>		// CHECK-SAME: dilations = dense<1> : tensor<1xi64>
// CHECK-SAME: strides = dense<1> : tensor<1xi64>		// CHECK-SAME: strides = dense<1> : tensor<1xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x1xi16>, tensor<3xi16>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x1xi16>, tensor<3xi16>)
// CHECK-SAME: outs(%{{.+}} : tensor<1x2x1xi16>) -> tensor<1x2x1xi16>		// CHECK-SAME: outs(%{{.+}} : tensor<1x2x1xi16>)
func.func @pooling_nwc_i16_max_tensor(%input: tensor<1x4x1xi16>) -> tensor<1x2x1xi16> {		func.func @pooling_nwc_i16_max_tensor(%input: tensor<1x4x1xi16>) -> tensor<1x2x1xi16> {
%fake = tensor.empty() : tensor<3xi16>		%fake = tensor.empty() : tensor<3xi16>
%init = tensor.empty() : tensor<1x2x1xi16>		%init = tensor.empty() : tensor<1x2x1xi16>
%cst = arith.constant 0 : i16		%cst = arith.constant 0 : i16
%fill = linalg.fill ins(%cst : i16) outs(%init : tensor<1x2x1xi16>) -> tensor<1x2x1xi16>		%fill = linalg.fill ins(%cst : i16) outs(%init : tensor<1x2x1xi16>)
%res = linalg.pooling_nwc_max {dilations = dense<1> : tensor<1xi64>, strides = dense<1> : tensor<1xi64>}		%res = linalg.pooling_nwc_max {dilations = dense<1> : tensor<1xi64>, strides = dense<1> : tensor<1xi64>}
ins(%input, %fake: tensor<1x4x1xi16>, tensor<3xi16>)		ins(%input, %fake: tensor<1x4x1xi16>, tensor<3xi16>)
outs(%fill: tensor<1x2x1xi16>) -> tensor<1x2x1xi16>		outs(%fill: tensor<1x2x1xi16>)
return %res : tensor<1x2x1xi16>		return %res : tensor<1x2x1xi16>
}		}

// -----		// -----

// CHECK-LABEL: func @pooling_nhwc_i16_max		// CHECK-LABEL: func @pooling_nhwc_i16_max
// CHECK: linalg.pooling_nhwc_max		// CHECK: linalg.pooling_nhwc_max
// CHECK-SAME: dilations = dense<1> : tensor<2xi64>		// CHECK-SAME: dilations = dense<1> : tensor<2xi64>
Show All 24 Lines

// -----		// -----

// CHECK-LABEL: func @pooling_nhwc_i32_max_tensor		// CHECK-LABEL: func @pooling_nhwc_i32_max_tensor
// CHECK: %{{.+}} = linalg.pooling_nhwc_max		// CHECK: %{{.+}} = linalg.pooling_nhwc_max
// CHECK-SAME: dilations = dense<1> : tensor<2xi64>		// CHECK-SAME: dilations = dense<1> : tensor<2xi64>
// CHECK-SAME: strides = dense<1> : tensor<2xi64>		// CHECK-SAME: strides = dense<1> : tensor<2xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x4x1xi32>, tensor<3x3xi32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x4x1xi32>, tensor<3x3xi32>)
// CHECK-SAME: outs(%{{.+}} : tensor<1x2x2x1xi32>) -> tensor<1x2x2x1xi32>		// CHECK-SAME: outs(%{{.+}} : tensor<1x2x2x1xi32>)
func.func @pooling_nhwc_i32_max_tensor(%input: tensor<1x4x4x1xi32>) -> tensor<1x2x2x1xi32> {		func.func @pooling_nhwc_i32_max_tensor(%input: tensor<1x4x4x1xi32>) -> tensor<1x2x2x1xi32> {
%fake = tensor.empty() : tensor<3x3xi32>		%fake = tensor.empty() : tensor<3x3xi32>
%init = tensor.empty() : tensor<1x2x2x1xi32>		%init = tensor.empty() : tensor<1x2x2x1xi32>
%cst = arith.constant 0 : i32		%cst = arith.constant 0 : i32
%fill = linalg.fill ins(%cst : i32) outs(%init : tensor<1x2x2x1xi32>) -> tensor<1x2x2x1xi32>		%fill = linalg.fill ins(%cst : i32) outs(%init : tensor<1x2x2x1xi32>)
%res = linalg.pooling_nhwc_max {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>}		%res = linalg.pooling_nhwc_max {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>}
ins(%input, %fake: tensor<1x4x4x1xi32>, tensor<3x3xi32>)		ins(%input, %fake: tensor<1x4x4x1xi32>, tensor<3x3xi32>)
outs(%fill: tensor<1x2x2x1xi32>) -> tensor<1x2x2x1xi32>		outs(%fill: tensor<1x2x2x1xi32>)
return %res : tensor<1x2x2x1xi32>		return %res : tensor<1x2x2x1xi32>
}		}

// -----		// -----

// CHECK-LABEL: func @pooling_nwc_i32_max_tensor		// CHECK-LABEL: func @pooling_nwc_i32_max_tensor
// CHECK: %{{.+}} = linalg.pooling_nwc_max		// CHECK: %{{.+}} = linalg.pooling_nwc_max
// CHECK-SAME: dilations = dense<1> : tensor<1xi64>		// CHECK-SAME: dilations = dense<1> : tensor<1xi64>
// CHECK-SAME: strides = dense<1> : tensor<1xi64>		// CHECK-SAME: strides = dense<1> : tensor<1xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x1xi32>, tensor<3xi32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x1xi32>, tensor<3xi32>)
// CHECK-SAME: outs(%{{.+}} : tensor<1x2x1xi32>) -> tensor<1x2x1xi32>		// CHECK-SAME: outs(%{{.+}} : tensor<1x2x1xi32>)
func.func @pooling_nwc_i32_max_tensor(%input: tensor<1x4x1xi32>) -> tensor<1x2x1xi32> {		func.func @pooling_nwc_i32_max_tensor(%input: tensor<1x4x1xi32>) -> tensor<1x2x1xi32> {
%fake = tensor.empty() : tensor<3xi32>		%fake = tensor.empty() : tensor<3xi32>
%init = tensor.empty() : tensor<1x2x1xi32>		%init = tensor.empty() : tensor<1x2x1xi32>
%cst = arith.constant 0 : i32		%cst = arith.constant 0 : i32
%fill = linalg.fill ins(%cst : i32) outs(%init : tensor<1x2x1xi32>) -> tensor<1x2x1xi32>		%fill = linalg.fill ins(%cst : i32) outs(%init : tensor<1x2x1xi32>)
%res = linalg.pooling_nwc_max {dilations = dense<1> : tensor<1xi64>, strides = dense<1> : tensor<1xi64>}		%res = linalg.pooling_nwc_max {dilations = dense<1> : tensor<1xi64>, strides = dense<1> : tensor<1xi64>}
ins(%input, %fake: tensor<1x4x1xi32>, tensor<3xi32>)		ins(%input, %fake: tensor<1x4x1xi32>, tensor<3xi32>)
outs(%fill: tensor<1x2x1xi32>) -> tensor<1x2x1xi32>		outs(%fill: tensor<1x2x1xi32>)
return %res : tensor<1x2x1xi32>		return %res : tensor<1x2x1xi32>
}		}

// -----		// -----

// CHECK-LABEL: func @pooling_nhwc_i32_max		// CHECK-LABEL: func @pooling_nhwc_i32_max
// CHECK: linalg.pooling_nhwc_max		// CHECK: linalg.pooling_nhwc_max
// CHECK-SAME: dilations = dense<1> : tensor<2xi64>		// CHECK-SAME: dilations = dense<1> : tensor<2xi64>
Show All 25 Lines

// -----		// -----

// CHECK-LABEL: func @pooling_nhwc_min_tensor		// CHECK-LABEL: func @pooling_nhwc_min_tensor
// CHECK: %{{.+}} = linalg.pooling_nhwc_min		// CHECK: %{{.+}} = linalg.pooling_nhwc_min
// CHECK-SAME: dilations = dense<1> : tensor<2xi64>		// CHECK-SAME: dilations = dense<1> : tensor<2xi64>
// CHECK-SAME: strides = dense<1> : tensor<2xi64>		// CHECK-SAME: strides = dense<1> : tensor<2xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x4x1xf32>, tensor<3x3xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x4x1xf32>, tensor<3x3xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<1x2x2x1xf32>) -> tensor<1x2x2x1xf32>		// CHECK-SAME: outs(%{{.+}} : tensor<1x2x2x1xf32>)
func.func @pooling_nhwc_min_tensor(%input: tensor<1x4x4x1xf32>) -> tensor<1x2x2x1xf32> {		func.func @pooling_nhwc_min_tensor(%input: tensor<1x4x4x1xf32>) -> tensor<1x2x2x1xf32> {
%fake = tensor.empty() : tensor<3x3xf32>		%fake = tensor.empty() : tensor<3x3xf32>
%init = tensor.empty() : tensor<1x2x2x1xf32>		%init = tensor.empty() : tensor<1x2x2x1xf32>
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x2x2x1xf32>) -> tensor<1x2x2x1xf32>		%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x2x2x1xf32>)
%res = linalg.pooling_nhwc_min {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>}		%res = linalg.pooling_nhwc_min {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>}
ins(%input, %fake: tensor<1x4x4x1xf32>, tensor<3x3xf32>)		ins(%input, %fake: tensor<1x4x4x1xf32>, tensor<3x3xf32>)
outs(%fill: tensor<1x2x2x1xf32>) -> tensor<1x2x2x1xf32>		outs(%fill: tensor<1x2x2x1xf32>)
return %res : tensor<1x2x2x1xf32>		return %res : tensor<1x2x2x1xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @pooling_nwc_min_tensor		// CHECK-LABEL: func @pooling_nwc_min_tensor
// CHECK: %{{.+}} = linalg.pooling_nwc_min		// CHECK: %{{.+}} = linalg.pooling_nwc_min
// CHECK-SAME: dilations = dense<1> : tensor<1xi64>		// CHECK-SAME: dilations = dense<1> : tensor<1xi64>
// CHECK-SAME: strides = dense<1> : tensor<1xi64>		// CHECK-SAME: strides = dense<1> : tensor<1xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x1xf32>, tensor<3xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x1xf32>, tensor<3xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<1x2x1xf32>) -> tensor<1x2x1xf32>		// CHECK-SAME: outs(%{{.+}} : tensor<1x2x1xf32>)
func.func @pooling_nwc_min_tensor(%input: tensor<1x4x1xf32>) -> tensor<1x2x1xf32> {		func.func @pooling_nwc_min_tensor(%input: tensor<1x4x1xf32>) -> tensor<1x2x1xf32> {
%fake = tensor.empty() : tensor<3xf32>		%fake = tensor.empty() : tensor<3xf32>
%init = tensor.empty() : tensor<1x2x1xf32>		%init = tensor.empty() : tensor<1x2x1xf32>
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x2x1xf32>) -> tensor<1x2x1xf32>		%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x2x1xf32>)
%res = linalg.pooling_nwc_min {dilations = dense<1> : tensor<1xi64>, strides = dense<1> : tensor<1xi64>}		%res = linalg.pooling_nwc_min {dilations = dense<1> : tensor<1xi64>, strides = dense<1> : tensor<1xi64>}
ins(%input, %fake: tensor<1x4x1xf32>, tensor<3xf32>)		ins(%input, %fake: tensor<1x4x1xf32>, tensor<3xf32>)
outs(%fill: tensor<1x2x1xf32>) -> tensor<1x2x1xf32>		outs(%fill: tensor<1x2x1xf32>)
return %res : tensor<1x2x1xf32>		return %res : tensor<1x2x1xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @pooling_nhwc_min		// CHECK-LABEL: func @pooling_nhwc_min
// CHECK: linalg.pooling_nhwc_min		// CHECK: linalg.pooling_nhwc_min
// CHECK-SAME: dilations = dense<1> : tensor<2xi64>		// CHECK-SAME: dilations = dense<1> : tensor<2xi64>
Show All 24 Lines

// -----		// -----

// CHECK-LABEL: func @pooling_ndhwc_sum_tensor		// CHECK-LABEL: func @pooling_ndhwc_sum_tensor
// CHECK: %{{.+}} = linalg.pooling_ndhwc_sum		// CHECK: %{{.+}} = linalg.pooling_ndhwc_sum
// CHECK-SAME: dilations = dense<1> : tensor<3xi64>		// CHECK-SAME: dilations = dense<1> : tensor<3xi64>
// CHECK-SAME: strides = dense<1> : tensor<3xi64>		// CHECK-SAME: strides = dense<1> : tensor<3xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x4x4x1xf32>, tensor<3x3x3xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x4x4x1xf32>, tensor<3x3x3xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<1x2x2x2x1xf32>) -> tensor<1x2x2x2x1xf32>		// CHECK-SAME: outs(%{{.+}} : tensor<1x2x2x2x1xf32>)
func.func @pooling_ndhwc_sum_tensor(%input: tensor<1x4x4x4x1xf32>) -> tensor<1x2x2x2x1xf32> {		func.func @pooling_ndhwc_sum_tensor(%input: tensor<1x4x4x4x1xf32>) -> tensor<1x2x2x2x1xf32> {
%fake = tensor.empty() : tensor<3x3x3xf32>		%fake = tensor.empty() : tensor<3x3x3xf32>
%init = tensor.empty() : tensor<1x2x2x2x1xf32>		%init = tensor.empty() : tensor<1x2x2x2x1xf32>
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x2x2x2x1xf32>) -> tensor<1x2x2x2x1xf32>		%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x2x2x2x1xf32>)
%res = linalg.pooling_ndhwc_sum {dilations = dense<1> : tensor<3xi64>, strides = dense<1> : tensor<3xi64>}		%res = linalg.pooling_ndhwc_sum {dilations = dense<1> : tensor<3xi64>, strides = dense<1> : tensor<3xi64>}
ins(%input, %fake: tensor<1x4x4x4x1xf32>, tensor<3x3x3xf32>)		ins(%input, %fake: tensor<1x4x4x4x1xf32>, tensor<3x3x3xf32>)
outs(%fill: tensor<1x2x2x2x1xf32>) -> tensor<1x2x2x2x1xf32>		outs(%fill: tensor<1x2x2x2x1xf32>)
return %res : tensor<1x2x2x2x1xf32>		return %res : tensor<1x2x2x2x1xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @pooling_ndhwc_sum		// CHECK-LABEL: func @pooling_ndhwc_sum
// CHECK: linalg.pooling_ndhwc_sum		// CHECK: linalg.pooling_ndhwc_sum
// CHECK-SAME: dilations = dense<1> : tensor<3xi64>		// CHECK-SAME: dilations = dense<1> : tensor<3xi64>
Show All 9 Lines

// -----		// -----

// CHECK-LABEL: func @pooling_ndhwc_max_tensor		// CHECK-LABEL: func @pooling_ndhwc_max_tensor
// CHECK: %{{.+}} = linalg.pooling_ndhwc_max		// CHECK: %{{.+}} = linalg.pooling_ndhwc_max
// CHECK-SAME: dilations = dense<1> : tensor<3xi64>		// CHECK-SAME: dilations = dense<1> : tensor<3xi64>
// CHECK-SAME: strides = dense<1> : tensor<3xi64>		// CHECK-SAME: strides = dense<1> : tensor<3xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x4x4x1xf32>, tensor<3x3x3xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x4x4x1xf32>, tensor<3x3x3xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<1x2x2x2x1xf32>) -> tensor<1x2x2x2x1xf32>		// CHECK-SAME: outs(%{{.+}} : tensor<1x2x2x2x1xf32>)
func.func @pooling_ndhwc_max_tensor(%input: tensor<1x4x4x4x1xf32>) -> tensor<1x2x2x2x1xf32> {		func.func @pooling_ndhwc_max_tensor(%input: tensor<1x4x4x4x1xf32>) -> tensor<1x2x2x2x1xf32> {
%fake = tensor.empty() : tensor<3x3x3xf32>		%fake = tensor.empty() : tensor<3x3x3xf32>
%init = tensor.empty() : tensor<1x2x2x2x1xf32>		%init = tensor.empty() : tensor<1x2x2x2x1xf32>
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x2x2x2x1xf32>) -> tensor<1x2x2x2x1xf32>		%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x2x2x2x1xf32>)
%res = linalg.pooling_ndhwc_max {dilations = dense<1> : tensor<3xi64>, strides = dense<1> : tensor<3xi64>}		%res = linalg.pooling_ndhwc_max {dilations = dense<1> : tensor<3xi64>, strides = dense<1> : tensor<3xi64>}
ins(%input, %fake: tensor<1x4x4x4x1xf32>, tensor<3x3x3xf32>)		ins(%input, %fake: tensor<1x4x4x4x1xf32>, tensor<3x3x3xf32>)
outs(%fill: tensor<1x2x2x2x1xf32>) -> tensor<1x2x2x2x1xf32>		outs(%fill: tensor<1x2x2x2x1xf32>)
return %res : tensor<1x2x2x2x1xf32>		return %res : tensor<1x2x2x2x1xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @pooling_ndhwc_max		// CHECK-LABEL: func @pooling_ndhwc_max
// CHECK: linalg.pooling_ndhwc_max		// CHECK: linalg.pooling_ndhwc_max
// CHECK-SAME: dilations = dense<1> : tensor<3xi64>		// CHECK-SAME: dilations = dense<1> : tensor<3xi64>
Show All 9 Lines

// -----		// -----

// CHECK-LABEL: func @pooling_ndhwc_min_tensor		// CHECK-LABEL: func @pooling_ndhwc_min_tensor
// CHECK: %{{.+}} = linalg.pooling_ndhwc_min		// CHECK: %{{.+}} = linalg.pooling_ndhwc_min
// CHECK-SAME: dilations = dense<1> : tensor<3xi64>		// CHECK-SAME: dilations = dense<1> : tensor<3xi64>
// CHECK-SAME: strides = dense<1> : tensor<3xi64>		// CHECK-SAME: strides = dense<1> : tensor<3xi64>
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x4x4x1xf32>, tensor<3x3x3xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<1x4x4x4x1xf32>, tensor<3x3x3xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<1x2x2x2x1xf32>) -> tensor<1x2x2x2x1xf32>		// CHECK-SAME: outs(%{{.+}} : tensor<1x2x2x2x1xf32>)
func.func @pooling_ndhwc_min_tensor(%input: tensor<1x4x4x4x1xf32>) -> tensor<1x2x2x2x1xf32> {		func.func @pooling_ndhwc_min_tensor(%input: tensor<1x4x4x4x1xf32>) -> tensor<1x2x2x2x1xf32> {
%fake = tensor.empty() : tensor<3x3x3xf32>		%fake = tensor.empty() : tensor<3x3x3xf32>
%init = tensor.empty() : tensor<1x2x2x2x1xf32>		%init = tensor.empty() : tensor<1x2x2x2x1xf32>
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x2x2x2x1xf32>) -> tensor<1x2x2x2x1xf32>		%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x2x2x2x1xf32>)
%res = linalg.pooling_ndhwc_min {dilations = dense<1> : tensor<3xi64>, strides = dense<1> : tensor<3xi64>}		%res = linalg.pooling_ndhwc_min {dilations = dense<1> : tensor<3xi64>, strides = dense<1> : tensor<3xi64>}
ins(%input, %fake: tensor<1x4x4x4x1xf32>, tensor<3x3x3xf32>)		ins(%input, %fake: tensor<1x4x4x4x1xf32>, tensor<3x3x3xf32>)
outs(%fill: tensor<1x2x2x2x1xf32>) -> tensor<1x2x2x2x1xf32>		outs(%fill: tensor<1x2x2x2x1xf32>)
return %res : tensor<1x2x2x2x1xf32>		return %res : tensor<1x2x2x2x1xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @pooling_ndhwc_min		// CHECK-LABEL: func @pooling_ndhwc_min
// CHECK: linalg.pooling_ndhwc_min		// CHECK: linalg.pooling_ndhwc_min
// CHECK-SAME: dilations = dense<1> : tensor<3xi64>		// CHECK-SAME: dilations = dense<1> : tensor<3xi64>
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	func.func @conv_interface_wrong_num_operands(
return %0 : tensor<?x?x?x?xf32>		return %0 : tensor<?x?x?x?xf32>
}		}

// -----		// -----

func.func @batch_reduce_matmul(%arg0: tensor<8x128x256xf32>, %arg1: tensor<8x256x512xf32>, %arg2: tensor<128x512xf32>) -> tensor<128x512xf32> {		func.func @batch_reduce_matmul(%arg0: tensor<8x128x256xf32>, %arg1: tensor<8x256x512xf32>, %arg2: tensor<128x512xf32>) -> tensor<128x512xf32> {
// CHECK: %{{.+}} = linalg.batch_reduce_matmul		// CHECK: %{{.+}} = linalg.batch_reduce_matmul
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<8x128x256xf32>, tensor<8x256x512xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : tensor<8x128x256xf32>, tensor<8x256x512xf32>)
// CHECK-SAME: outs(%{{.+}} : tensor<128x512xf32>) -> tensor<128x512xf32>		// CHECK-SAME: outs(%{{.+}} : tensor<128x512xf32>)
%0 = linalg.batch_reduce_matmul ins(%arg0, %arg1 : tensor<8x128x256xf32>, tensor<8x256x512xf32>) outs(%arg2: tensor<128x512xf32>) -> tensor<128x512xf32>		%0 = linalg.batch_reduce_matmul ins(%arg0, %arg1 : tensor<8x128x256xf32>, tensor<8x256x512xf32>) outs(%arg2: tensor<128x512xf32>)
return %0: tensor<128x512xf32>		return %0: tensor<128x512xf32>
}		}

// -----		// -----

func.func @batch_reduce_matmul(%arg0: memref<?x?x?xf32>, %arg1: memref<?x?x?xf32>, %arg2: memref<?x?xf32>) {		func.func @batch_reduce_matmul(%arg0: memref<?x?x?xf32>, %arg1: memref<?x?x?xf32>, %arg2: memref<?x?xf32>) {
// CHECK: linalg.batch_reduce_matmul		// CHECK: linalg.batch_reduce_matmul
// CHECK-SAME: ins(%{{.+}}, %{{.+}} : memref<?x?x?xf32>, memref<?x?x?xf32>)		// CHECK-SAME: ins(%{{.+}}, %{{.+}} : memref<?x?x?xf32>, memref<?x?x?xf32>)
Show All 26 Lines

mlir/test/Dialect/Linalg/namedop_conversion.mlir

	// RUN: mlir-opt %s -linalg-named-op-conversion -split-input-file \| FileCheck %s			// RUN: mlir-opt %s -linalg-named-op-conversion -split-input-file \| FileCheck %s

	// CHECK-LABEL: @depthwise_conv			// CHECK-LABEL: @depthwise_conv
	func.func @depthwise_conv(%arg0: tensor<?x?x?x?xf32>, %arg1: tensor<?x?x?x1xf32>, %arg2: tensor<?x?x?x?x1xf32>) -> tensor<?x?x?x?x1xf32> {			func.func @depthwise_conv(%arg0: tensor<?x?x?x?xf32>, %arg1: tensor<?x?x?x1xf32>, %arg2: tensor<?x?x?x?x1xf32>) -> tensor<?x?x?x?x1xf32> {
	// CHECK-DAG: %[[KERNEL:.+]] = tensor.collapse_shape %arg1 {{\[\[}}0], [1], [2, 3]]			// CHECK-DAG: %[[KERNEL:.+]] = tensor.collapse_shape %arg1 {{\[\[}}0], [1], [2, 3]]
	// CHECK-DAG: %[[INIT:.+]] = tensor.collapse_shape %arg2 {{\[\[}}0], [1], [2], [3, 4]]			// CHECK-DAG: %[[INIT:.+]] = tensor.collapse_shape %arg2 {{\[\[}}0], [1], [2], [3, 4]]
	// CHECK-DAG: %[[CONV:.+]] = linalg.depthwise_conv_2d_nhwc_hwc {_someattr, dilations = dense<1> : tensor<2xi64>, strides = dense<2> : tensor<2xi64>} ins(%arg0, %[[KERNEL]] : tensor<?x?x?x?xf32>, tensor<?x?x?xf32>) outs(%[[INIT]] : tensor<?x?x?x?xf32>)			// CHECK-DAG: %[[CONV:.+]] = linalg.depthwise_conv_2d_nhwc_hwc {_someattr, dilations = dense<1> : tensor<2xi64>, strides = dense<2> : tensor<2xi64>} ins(%arg0, %[[KERNEL]] : tensor<?x?x?x?xf32>, tensor<?x?x?xf32>) outs(%[[INIT]] : tensor<?x?x?x?xf32>)
	// CHECK: %[[OUT:.+]] = tensor.expand_shape %[[CONV]] {{\[\[}}0], [1], [2], [3, 4]]			// CHECK: %[[OUT:.+]] = tensor.expand_shape %[[CONV]] {{\[\[}}0], [1], [2], [3, 4]]
	%0 = linalg.depthwise_conv_2d_nhwc_hwcm {_someattr, dilations = dense<1> : tensor<2xi64>, strides = dense<2> : tensor<2xi64>} ins(%arg0, %arg1 : tensor<?x?x?x?xf32>, tensor<?x?x?x1xf32>) outs(%arg2 : tensor<?x?x?x?x1xf32>) -> tensor<?x?x?x?x1xf32>			%0 = linalg.depthwise_conv_2d_nhwc_hwcm {_someattr, dilations = dense<1> : tensor<2xi64>, strides = dense<2> : tensor<2xi64>} ins(%arg0, %arg1 : tensor<?x?x?x?xf32>, tensor<?x?x?x1xf32>) outs(%arg2 : tensor<?x?x?x?x1xf32>)
	return %0 : tensor<?x?x?x?x1xf32>			return %0 : tensor<?x?x?x?x1xf32>
	}			}


	// -----			// -----

	// CHECK-LABEL: @depthwise_conv_q			// CHECK-LABEL: @depthwise_conv_q
	func.func @depthwise_conv_q(%arg0: tensor<?x?x?x?xi8>, %arg1: tensor<?x?x?x1xi8>, %arg2: tensor<?x?x?x?x1xi32>, %arg3 : i32, %arg4 : i32) -> tensor<?x?x?x?x1xi32> {			func.func @depthwise_conv_q(%arg0: tensor<?x?x?x?xi8>, %arg1: tensor<?x?x?x1xi8>, %arg2: tensor<?x?x?x?x1xi32>, %arg3 : i32, %arg4 : i32) -> tensor<?x?x?x?x1xi32> {
	// CHECK-DAG: %[[KERNEL:.+]] = tensor.collapse_shape %arg1 {{\[\[}}0], [1], [2, 3]]			// CHECK-DAG: %[[KERNEL:.+]] = tensor.collapse_shape %arg1 {{\[\[}}0], [1], [2, 3]]
	// CHECK-DAG: %[[INIT:.+]] = tensor.collapse_shape %arg2 {{\[\[}}0], [1], [2], [3, 4]]			// CHECK-DAG: %[[INIT:.+]] = tensor.collapse_shape %arg2 {{\[\[}}0], [1], [2], [3, 4]]
	// CHECK-DAG: %[[CONV:.+]] = linalg.depthwise_conv_2d_nhwc_hwc_q {_someattr, dilations = dense<1> : tensor<2xi64>, strides = dense<2> : tensor<2xi64>} ins(%arg0, %[[KERNEL]], %arg3, %arg4 : tensor<?x?x?x?xi8>, tensor<?x?x?xi8>, i32, i32) outs(%[[INIT]] : tensor<?x?x?x?xi32>)			// CHECK-DAG: %[[CONV:.+]] = linalg.depthwise_conv_2d_nhwc_hwc_q {_someattr, dilations = dense<1> : tensor<2xi64>, strides = dense<2> : tensor<2xi64>} ins(%arg0, %[[KERNEL]], %arg3, %arg4 : tensor<?x?x?x?xi8>, tensor<?x?x?xi8>, i32, i32) outs(%[[INIT]] : tensor<?x?x?x?xi32>)
	// CHECK: %[[OUT:.+]] = tensor.expand_shape %[[CONV]] {{\[\[}}0], [1], [2], [3, 4]]			// CHECK: %[[OUT:.+]] = tensor.expand_shape %[[CONV]] {{\[\[}}0], [1], [2], [3, 4]]
	%0 = linalg.depthwise_conv_2d_nhwc_hwcm_q {_someattr, dilations = dense<1> : tensor<2xi64>, strides = dense<2> : tensor<2xi64>} ins(%arg0, %arg1, %arg3, %arg4 : tensor<?x?x?x?xi8>, tensor<?x?x?x1xi8>, i32, i32) outs(%arg2 : tensor<?x?x?x?x1xi32>) -> tensor<?x?x?x?x1xi32>			%0 = linalg.depthwise_conv_2d_nhwc_hwcm_q {_someattr, dilations = dense<1> : tensor<2xi64>, strides = dense<2> : tensor<2xi64>} ins(%arg0, %arg1, %arg3, %arg4 : tensor<?x?x?x?xi8>, tensor<?x?x?x1xi8>, i32, i32) outs(%arg2 : tensor<?x?x?x?x1xi32>)
	return %0 : tensor<?x?x?x?x1xi32>			return %0 : tensor<?x?x?x?x1xi32>
	}			}

mlir/test/Dialect/Linalg/one-shot-bufferize-analysis-2fill-extract-matmul-all-perms.mlir

Show All 12 Lines	func.func @fill_extract_matmul_1234(
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<256x256xf32>		%0 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: {__inplace_operands_attr__ = ["none", "false"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%3, %4 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%3, %4 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>)
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @fill_extract_matmul_		// CHECK-LABEL: func @fill_extract_matmul_
func.func @fill_extract_matmul_1243(		func.func @fill_extract_matmul_1243(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<256x256xf32>		%0 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: {__inplace_operands_attr__ = ["none", "false"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%3, %4 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%3, %4 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>)
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @fill_extract_matmul_		// CHECK-LABEL: func @fill_extract_matmul_
func.func @fill_extract_matmul_1324(		func.func @fill_extract_matmul_1324(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<256x256xf32>		%0 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: {__inplace_operands_attr__ = ["none", "false"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%3, %4 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%3, %4 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>)
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @fill_extract_matmul_		// CHECK-LABEL: func @fill_extract_matmul_
func.func @fill_extract_matmul_1342(		func.func @fill_extract_matmul_1342(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<256x256xf32>		%0 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: {__inplace_operands_attr__ = ["none", "false"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>) -> tensor<16x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%3, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%3, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>)
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @fill_extract_matmul_		// CHECK-LABEL: func @fill_extract_matmul_
func.func @fill_extract_matmul_1423(		func.func @fill_extract_matmul_1423(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<256x256xf32>		%0 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: {__inplace_operands_attr__ = ["none", "false"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>) -> tensor<16x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%3, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%3, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>)
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @fill_extract_matmul_		// CHECK-LABEL: func @fill_extract_matmul_
func.func @fill_extract_matmul_1432(		func.func @fill_extract_matmul_1432(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<256x256xf32>		%0 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: {__inplace_operands_attr__ = ["none", "false"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>) -> tensor<16x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%3, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%3, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>)
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @fill_extract_matmul_		// CHECK-LABEL: func @fill_extract_matmul_
func.func @fill_extract_matmul_2134(		func.func @fill_extract_matmul_2134(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<256x256xf32>		%0 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: {__inplace_operands_attr__ = ["none", "false"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%3, %4 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%3, %4 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>)
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @fill_extract_matmul_		// CHECK-LABEL: func @fill_extract_matmul_
func.func @fill_extract_matmul_2143(		func.func @fill_extract_matmul_2143(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<256x256xf32>		%0 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: {__inplace_operands_attr__ = ["none", "false"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%3, %4 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%3, %4 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>)
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @fill_extract_matmul_		// CHECK-LABEL: func @fill_extract_matmul_
func.func @fill_extract_matmul_2314(		func.func @fill_extract_matmul_2314(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<256x256xf32>		%0 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: {__inplace_operands_attr__ = ["none", "false"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>		%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>)
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%1, %4 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%1, %4 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>)
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @fill_extract_matmul_		// CHECK-LABEL: func @fill_extract_matmul_
func.func @fill_extract_matmul_2341(		func.func @fill_extract_matmul_2341(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<256x256xf32>		%0 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: {__inplace_operands_attr__ = ["none", "false"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>		%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>)
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%1, %4 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%1, %4 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>)
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @fill_extract_matmul_		// CHECK-LABEL: func @fill_extract_matmul_
func.func @fill_extract_matmul_2413(		func.func @fill_extract_matmul_2413(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<256x256xf32>		%0 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: {__inplace_operands_attr__ = ["none", "false"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%3, %4 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%3, %4 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>)
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @fill_extract_matmul_		// CHECK-LABEL: func @fill_extract_matmul_
func.func @fill_extract_matmul_2431(		func.func @fill_extract_matmul_2431(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<256x256xf32>		%0 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: {__inplace_operands_attr__ = ["none", "false"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>		%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>)
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%1, %4 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%1, %4 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>)
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @fill_extract_matmul_		// CHECK-LABEL: func @fill_extract_matmul_
func.func @fill_extract_matmul_3124(		func.func @fill_extract_matmul_3124(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<256x256xf32>		%0 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["false"]}		// CHECK: {__inplace_operands_attr__ = ["false"]}
%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>		%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>)
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%1, %4 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%1, %4 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>)
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @fill_extract_matmul_		// CHECK-LABEL: func @fill_extract_matmul_
func.func @fill_extract_matmul_3142(		func.func @fill_extract_matmul_3142(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<256x256xf32>		%0 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["false"]}		// CHECK: {__inplace_operands_attr__ = ["false"]}
%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>		%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>)
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>) -> tensor<16x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%1, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%1, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>)
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @fill_extract_matmul_		// CHECK-LABEL: func @fill_extract_matmul_
func.func @fill_extract_matmul_3214(		func.func @fill_extract_matmul_3214(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true}) -> tensor<256x256xf32>		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true}) -> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<256x256xf32>		%0 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["false"]}		// CHECK: {__inplace_operands_attr__ = ["false"]}
%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>		%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>)
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%1, %4 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%1, %4 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>)
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @fill_extract_matmul_		// CHECK-LABEL: func @fill_extract_matmul_
func.func @fill_extract_matmul_3241(		func.func @fill_extract_matmul_3241(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<256x256xf32>		%0 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["false"]}		// CHECK: {__inplace_operands_attr__ = ["false"]}
%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>		%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>)
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%1, %4 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%1, %4 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>)
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @fill_extract_matmul_		// CHECK-LABEL: func @fill_extract_matmul_
func.func @fill_extract_matmul_3412(		func.func @fill_extract_matmul_3412(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<256x256xf32>		%0 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["false"]}		// CHECK: {__inplace_operands_attr__ = ["false"]}
%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>		%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>)
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>) -> tensor<16x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%1, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%1, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>)
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @fill_extract_matmul_		// CHECK-LABEL: func @fill_extract_matmul_
func.func @fill_extract_matmul_3421(		func.func @fill_extract_matmul_3421(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<256x256xf32>		%0 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["false"]}		// CHECK: {__inplace_operands_attr__ = ["false"]}
%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>) -> tensor<16x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>		%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>)
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%1, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%1, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>)
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @fill_extract_matmul_		// CHECK-LABEL: func @fill_extract_matmul_
func.func @fill_extract_matmul_4123(		func.func @fill_extract_matmul_4123(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<256x256xf32>		%0 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["false"]}		// CHECK: {__inplace_operands_attr__ = ["false"]}
%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>) -> tensor<16x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%3, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%3, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>)
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @fill_extract_matmul_		// CHECK-LABEL: func @fill_extract_matmul_
func.func @fill_extract_matmul_4132(		func.func @fill_extract_matmul_4132(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<256x256xf32>		%0 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["false"]}		// CHECK: {__inplace_operands_attr__ = ["false"]}
%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>) -> tensor<16x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%3, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%3, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>)
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @fill_extract_matmul_		// CHECK-LABEL: func @fill_extract_matmul_
func.func @fill_extract_matmul_4213(		func.func @fill_extract_matmul_4213(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<256x256xf32>		%0 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["false"]}		// CHECK: {__inplace_operands_attr__ = ["false"]}
%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>) -> tensor<16x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%3, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%3, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>)
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @fill_extract_matmul_		// CHECK-LABEL: func @fill_extract_matmul_
func.func @fill_extract_matmul_4231(		func.func @fill_extract_matmul_4231(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<256x256xf32>		%0 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["false"]}		// CHECK: {__inplace_operands_attr__ = ["false"]}
%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>) -> tensor<16x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>		%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>)
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%1, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%1, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>)
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @fill_extract_matmul_		// CHECK-LABEL: func @fill_extract_matmul_
func.func @fill_extract_matmul_4312(		func.func @fill_extract_matmul_4312(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<256x256xf32>		%0 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["false"]}		// CHECK: {__inplace_operands_attr__ = ["false"]}
%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>		%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>)
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>) -> tensor<16x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%1, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%1, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>)
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @fill_extract_matmul_		// CHECK-LABEL: func @fill_extract_matmul_
func.func @fill_extract_matmul_4321(		func.func @fill_extract_matmul_4321(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<256x256xf32>		%0 = bufferization.alloc_tensor() : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["false"]}		// CHECK: {__inplace_operands_attr__ = ["false"]}
%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>) -> tensor<16x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>)
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>		%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>)
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%1, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%1, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>)
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

mlir/test/Dialect/Linalg/one-shot-bufferize.mlir

Show All 17 Lines	func.func @fill_inplace(
-> tensor<?xf32>		-> tensor<?xf32>
{		{
// CHECK: %[[F0:.*]] = arith.constant 0.000000e+00 : f32		// CHECK: %[[F0:.*]] = arith.constant 0.000000e+00 : f32
%f0 = arith.constant 0.0 : f32		%f0 = arith.constant 0.0 : f32

/// Inplaceable, no alloc		/// Inplaceable, no alloc
// CHECK-NOT: alloc		// CHECK-NOT: alloc
// CHECK: linalg.fill ins(%[[F0]] : f32) outs(%[[A]] : memref<?xf32, strided<[?], offset: ?>>)		// CHECK: linalg.fill ins(%[[F0]] : f32) outs(%[[A]] : memref<?xf32, strided<[?], offset: ?>>)
%r = linalg.fill ins(%f0 : f32) outs(%A : tensor<?xf32>) -> tensor<?xf32>		%r = linalg.fill ins(%f0 : f32) outs(%A : tensor<?xf32>)

// CHECK: return		// CHECK: return
// CHECK-NOT: tensor		// CHECK-NOT: tensor
return %r: tensor<?xf32>		return %r: tensor<?xf32>
}		}

// -----		// -----

/// No bufferization.writable flag, must allocate.		/// No bufferization.writable flag, must allocate.
// CHECK-LABEL: func @not_inplace(		// CHECK-LABEL: func @not_inplace(
// CHECK-SAME: %[[A:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?], offset: ?>>) -> memref<?xf32> {		// CHECK-SAME: %[[A:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?], offset: ?>>) -> memref<?xf32> {
// CHECK-NO-LAYOUT-MAP-LABEL: func @not_inplace(%{{.*}}: memref<?xf32>) -> memref<?xf32>		// CHECK-NO-LAYOUT-MAP-LABEL: func @not_inplace(%{{.*}}: memref<?xf32>) -> memref<?xf32>
func.func @not_inplace(		func.func @not_inplace(
%A : tensor<?xf32> {bufferization.writable = false})		%A : tensor<?xf32> {bufferization.writable = false})
-> tensor<?xf32>		-> tensor<?xf32>
{		{
// CHECK: %[[F0:.*]] = arith.constant 0.000000e+00 : f32		// CHECK: %[[F0:.*]] = arith.constant 0.000000e+00 : f32
%f0 = arith.constant 0.0 : f32		%f0 = arith.constant 0.0 : f32

// CHECK: %[[D0:.]] = memref.dim %[[A]], {{.}} : memref<?xf32, strided<[?], offset: ?>>		// CHECK: %[[D0:.]] = memref.dim %[[A]], {{.}} : memref<?xf32, strided<[?], offset: ?>>
// CHECK: %[[ALLOC:.*]] = memref.alloc(%[[D0]]) {alignment = 64 : i64} : memref<?xf32>		// CHECK: %[[ALLOC:.*]] = memref.alloc(%[[D0]]) {alignment = 64 : i64} : memref<?xf32>
// CHECK: linalg.fill ins(%[[F0]] : f32) outs(%[[ALLOC]] : memref<?xf32>)		// CHECK: linalg.fill ins(%[[F0]] : f32) outs(%[[ALLOC]] : memref<?xf32>)
%r = linalg.fill ins(%f0 : f32) outs(%A : tensor<?xf32>) -> tensor<?xf32>		%r = linalg.fill ins(%f0 : f32) outs(%A : tensor<?xf32>)

// CHECK-NOT: dealloc		// CHECK-NOT: dealloc
// CHECK: return %[[ALLOC]] : memref<?xf32>		// CHECK: return %[[ALLOC]] : memref<?xf32>
return %r: tensor<?xf32>		return %r: tensor<?xf32>
}		}

// -----		// -----


// CHECK-LABEL: func @not_inplace		// CHECK-LABEL: func @not_inplace
// CHECK-SAME: %[[A:[a-zA-Z0-9]*]]: memref<?x?xf32, strided<[?, ?], offset: ?>>) {		// CHECK-SAME: %[[A:[a-zA-Z0-9]*]]: memref<?x?xf32, strided<[?, ?], offset: ?>>) {
// CHECK-NO-LAYOUT-MAP-LABEL: func @not_inplace(%{{.*}}: memref<?x?xf32>) {		// CHECK-NO-LAYOUT-MAP-LABEL: func @not_inplace(%{{.*}}: memref<?x?xf32>) {
func.func @not_inplace(		func.func @not_inplace(
%A : tensor<?x?xf32> {bufferization.writable = true})		%A : tensor<?x?xf32> {bufferization.writable = true})
-> tensor<?x?xf32>		-> tensor<?x?xf32>
{		{
%f0 = arith.constant 0.0 : f32		%f0 = arith.constant 0.0 : f32

/// Cross-op multiple uses of %A, the first op which has interfering reads must alloc.		/// Cross-op multiple uses of %A, the first op which has interfering reads must alloc.
// CHECK: %[[ALLOC:.*]] = memref.alloc		// CHECK: %[[ALLOC:.*]] = memref.alloc
// CHECK: linalg.fill ins({{.}}{{.}}outs(%[[ALLOC]]		// CHECK: linalg.fill ins({{.}}{{.}}outs(%[[ALLOC]]
%f = linalg.fill ins(%f0 : f32) outs(%A : tensor<?x?xf32>) -> tensor<?x?xf32>		%f = linalg.fill ins(%f0 : f32) outs(%A : tensor<?x?xf32>)

/// The second op has no interfering reads and can reuse.		/// The second op has no interfering reads and can reuse.
// CHECK-NOT: alloc		// CHECK-NOT: alloc
// CHECK: linalg.matmul ins(%[[ALLOC]], %[[ALLOC]]{{.*}}) outs(%[[A]]		// CHECK: linalg.matmul ins(%[[ALLOC]], %[[ALLOC]]{{.*}}) outs(%[[A]]
%r = linalg.matmul ins(%f, %f: tensor<?x?xf32>, tensor<?x?xf32>)		%r = linalg.matmul ins(%f, %f: tensor<?x?xf32>, tensor<?x?xf32>)
outs(%A: tensor<?x?xf32>)		outs(%A: tensor<?x?xf32>)
-> tensor<?x?xf32>

// CHECK: memref.dealloc %[[ALLOC]]		// CHECK: memref.dealloc %[[ALLOC]]
// CHECK: return		// CHECK: return
// CHECK-NOT: tensor		// CHECK-NOT: tensor
return %r: tensor<?x?xf32>		return %r: tensor<?x?xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @not_inplace		// CHECK-LABEL: func @not_inplace
func.func @not_inplace(		func.func @not_inplace(
%A : tensor<?x?xf32> {bufferization.writable = true}) -> tensor<?x?xf32> {		%A : tensor<?x?xf32> {bufferization.writable = true}) -> tensor<?x?xf32> {
/// Within op multiple uses of %A, must alloc.		/// Within op multiple uses of %A, must alloc.
// CHECK: alloc		// CHECK: alloc
%r = linalg.matmul ins(%A, %A: tensor<?x?xf32>, tensor<?x?xf32>)		%r = linalg.matmul ins(%A, %A: tensor<?x?xf32>, tensor<?x?xf32>)
outs(%A: tensor<?x?xf32>)		outs(%A: tensor<?x?xf32>)
-> tensor<?x?xf32>
// CHECK-NOT: dealloc		// CHECK-NOT: dealloc
return %r: tensor<?x?xf32>		return %r: tensor<?x?xf32>
}		}
// -----		// -----

// CHECK-LABEL: func @vec_inplace		// CHECK-LABEL: func @vec_inplace
func.func @vec_inplace(		func.func @vec_inplace(
%A : tensor<?xf32> {bufferization.writable = true}, %vec : vector<4xf32>)		%A : tensor<?xf32> {bufferization.writable = true}, %vec : vector<4xf32>)
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	%2 = scf.for %arg5 = %c0 to %c192 step %c16 iter_args(%arg6 = %arg4) -> (tensor<128x192xf32>) {
// C was already replaced with a copy by preprocessing, so no copy is		// C was already replaced with a copy by preprocessing, so no copy is
// needed here.		// needed here.
// CHECK: %[[C_SLICE:.*]] = memref.subview %[[ALLOC]]		// CHECK: %[[C_SLICE:.*]] = memref.subview %[[ALLOC]]
%4 = tensor.extract_slice %C[%arg3, %arg5] [8, 16] [1, 1] :		%4 = tensor.extract_slice %C[%arg3, %arg5] [8, 16] [1, 1] :
tensor<128x192xf32> to tensor<8x16xf32>		tensor<128x192xf32> to tensor<8x16xf32>

// linalg.fill is inplace.		// linalg.fill is inplace.
// CHECK: linalg.fill ins(%{{.*}} : f32) outs(%[[C_SLICE]]		// CHECK: linalg.fill ins(%{{.*}} : f32) outs(%[[C_SLICE]]
%5 = linalg.fill ins(%cst : f32) outs(%4 : tensor<8x16xf32>) -> tensor<8x16xf32>		%5 = linalg.fill ins(%cst : f32) outs(%4 : tensor<8x16xf32>)

// CHECK: scf.for %[[K:.*]] =		// CHECK: scf.for %[[K:.*]] =
%6 = scf.for %arg7 = %c0 to %c256 step %c32 iter_args(%arg8 = %5) -> (tensor<8x16xf32>) {		%6 = scf.for %arg7 = %c0 to %c256 step %c32 iter_args(%arg8 = %5) -> (tensor<8x16xf32>) {
%8 = tensor.extract_slice %1[0, %arg7] [8, 32] [1, 1] :		%8 = tensor.extract_slice %1[0, %arg7] [8, 32] [1, 1] :
tensor<8x256xf32> to tensor<8x32xf32>		tensor<8x256xf32> to tensor<8x32xf32>
%9 = tensor.extract_slice %3[%arg7, 0] [32, 16] [1, 1] :		%9 = tensor.extract_slice %3[%arg7, 0] [32, 16] [1, 1] :
tensor<256x16xf32> to tensor<32x16xf32>		tensor<256x16xf32> to tensor<32x16xf32>

// linalg.matmul is inplace as well as the enclosing scf.for.		// linalg.matmul is inplace as well as the enclosing scf.for.
// CHECK: linalg.matmul ins({{.*}} outs(%[[C_SLICE]]		// CHECK: linalg.matmul ins({{.*}} outs(%[[C_SLICE]]
%10 = linalg.matmul ins(%8, %9 : tensor<8x32xf32>, tensor<32x16xf32>)		%10 = linalg.matmul ins(%8, %9 : tensor<8x32xf32>, tensor<32x16xf32>)
outs(%arg8 : tensor<8x16xf32>)		outs(%arg8 : tensor<8x16xf32>)
-> tensor<8x16xf32>
scf.yield %10 : tensor<8x16xf32>		scf.yield %10 : tensor<8x16xf32>
}		}

// insert_slice is inplace but its source comes from an equivalent buffer		// insert_slice is inplace but its source comes from an equivalent buffer
// that is not in place. So we must insert a copy of the small buffer into		// that is not in place. So we must insert a copy of the small buffer into
// the bigger buffer.		// the bigger buffer.
// CHECK: %[[T:.*]] = memref.subview %[[C]][%[[I]], %[[J]]] [8, 16] [1, 1]		// CHECK: %[[T:.*]] = memref.subview %[[C]][%[[I]], %[[J]]] [8, 16] [1, 1]
// CHECK: memref.copy %[[C_SLICE]], %[[T]]		// CHECK: memref.copy %[[C_SLICE]], %[[T]]
Show All 19 Lines	func.func @dominance_violation_bug_1(
%A : tensor<?x?xf32> {bufferization.writable = false},		%A : tensor<?x?xf32> {bufferization.writable = false},
%idx : index)		%idx : index)
-> tensor<?x?xf32>		-> tensor<?x?xf32>
{		{
%f0 = arith.constant 0.0 : f32		%f0 = arith.constant 0.0 : f32

%sA = tensor.extract_slice %A[0, 0][%idx, %idx][1, 1] : tensor<?x?xf32> to tensor<?x?xf32>		%sA = tensor.extract_slice %A[0, 0][%idx, %idx][1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
%ssA = tensor.extract_slice %sA[0, 0][4, 4][1, 1] : tensor<?x?xf32> to tensor<4x4xf32>		%ssA = tensor.extract_slice %sA[0, 0][4, 4][1, 1] : tensor<?x?xf32> to tensor<4x4xf32>
%FA = linalg.fill ins(%f0 : f32) outs(%ssA : tensor<4x4xf32>) -> tensor<4x4xf32>		%FA = linalg.fill ins(%f0 : f32) outs(%ssA : tensor<4x4xf32>)
%rsA = tensor.insert_slice %FA into %sA[0, 0][4, 4][1, 1] : tensor<4x4xf32> into tensor<?x?xf32>		%rsA = tensor.insert_slice %FA into %sA[0, 0][4, 4][1, 1] : tensor<4x4xf32> into tensor<?x?xf32>
%rA = tensor.insert_slice %rsA into %A[0, 0][%idx, %idx][1, 1] : tensor<?x?xf32> into tensor<?x?xf32>		%rA = tensor.insert_slice %rsA into %A[0, 0][%idx, %idx][1, 1] : tensor<?x?xf32> into tensor<?x?xf32>

return %rA : tensor<?x?xf32>		return %rA : tensor<?x?xf32>
}		}

// -----		// -----

▲ Show 20 Lines • Show All 195 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/reshape_control_fusion.mlir

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	%fill = linalg.generic {
indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>],		indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>],
iterator_types = ["parallel", "parallel"]}		iterator_types = ["parallel", "parallel"]}
outs(%init : tensor<?x?xf32>) {		outs(%init : tensor<?x?xf32>) {
^bb0(%arg2: f32):		^bb0(%arg2: f32):
linalg.yield %cst : f32		linalg.yield %cst : f32
} -> tensor<?x?xf32>		} -> tensor<?x?xf32>
%0 = tensor.expand_shape %fill [[0, 1], [2]] : tensor<?x?xf32> into tensor<1x?x?xf32>		%0 = tensor.expand_shape %fill [[0, 1], [2]] : tensor<?x?xf32> into tensor<1x?x?xf32>
%1 = linalg.batch_matmul ins(%arg0, %arg1 : tensor<1x?x?xf32>, tensor<1x?x?xf32>)		%1 = linalg.batch_matmul ins(%arg0, %arg1 : tensor<1x?x?xf32>, tensor<1x?x?xf32>)
outs(%0 : tensor<1x?x?xf32>) -> tensor<1x?x?xf32>		outs(%0 : tensor<1x?x?xf32>)
return %1 : tensor<1x?x?xf32>		return %1 : tensor<1x?x?xf32>
}		}
// CHECK-DAG: #[[MAP:.+]] = affine_map<(d0, d1, d2) -> (d0, d1, d2)		// CHECK-DAG: #[[MAP:.+]] = affine_map<(d0, d1, d2) -> (d0, d1, d2)
// CHECK: func @control_consumer_reshape_fusion		// CHECK: func @control_consumer_reshape_fusion
// CHECK: %[[FILL:.+]] = linalg.generic		// CHECK: %[[FILL:.+]] = linalg.generic
// CHECK-SAME: indexing_maps = [#[[MAP]]]		// CHECK-SAME: indexing_maps = [#[[MAP]]]
// CHECK-SAME: outs(%{{.+}} : tensor<1x?x?xf32>)		// CHECK-SAME: outs(%{{.+}} : tensor<1x?x?xf32>)
// CHECK: linalg.batch_matmul		// CHECK: linalg.batch_matmul
// CHECK-SAME: outs(%[[FILL]] : tensor<1x?x?xf32>)		// CHECK-SAME: outs(%[[FILL]] : tensor<1x?x?xf32>)

mlir/test/Dialect/Linalg/resolve-shaped-type-result-dims.mlir

	Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
	// -----			// -----

	func.func @remove_dim_result_uses_sequence			func.func @remove_dim_result_uses_sequence
	(%arg0 : tensor<?x?xf32>, %arg1 : tensor<?x?xf32>,			(%arg0 : tensor<?x?xf32>, %arg1 : tensor<?x?xf32>,
	%arg2 : tensor<?x?xf32>) -> (index, index, index, index) {			%arg2 : tensor<?x?xf32>) -> (index, index, index, index) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%c1 = arith.constant 1 : index			%c1 = arith.constant 1 : index
	%0 = linalg.matmul ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)			%0 = linalg.matmul ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)
	outs(%arg2 : tensor<?x?xf32>) -> tensor<?x?xf32>			outs(%arg2 : tensor<?x?xf32>)
	%1 = tensor.dim %0, %c0 : tensor<?x?xf32>			%1 = tensor.dim %0, %c0 : tensor<?x?xf32>
	%2 = tensor.dim %0, %c1 : tensor<?x?xf32>			%2 = tensor.dim %0, %c1 : tensor<?x?xf32>
	%3 = linalg.generic			%3 = linalg.generic
	{indexing_maps = [affine_map<(d0, d1, d2) -> (d1, d0)>,			{indexing_maps = [affine_map<(d0, d1, d2) -> (d1, d0)>,
	affine_map<(d0, d1, d2) -> (d0, d2)>,			affine_map<(d0, d1, d2) -> (d0, d2)>,
	affine_map<(d0, d1, d2) -> (d0, d2)>],			affine_map<(d0, d1, d2) -> (d0, d2)>],
	iterator_types = ["parallel", "reduction", "parallel"]}			iterator_types = ["parallel", "reduction", "parallel"]}
	ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)			ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)
	▲ Show 20 Lines • Show All 156 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/roundtrip.mlir

Show First 20 Lines • Show All 177 Lines • ▼ Show 20 Lines

// -----		// -----

func.func @generic_with_multiple_tensor_outputs(		func.func @generic_with_multiple_tensor_outputs(
%arg0: tensor<?xi32>, %arg1: tensor<?xi32>, %arg2: i32)		%arg0: tensor<?xi32>, %arg1: tensor<?xi32>, %arg2: i32)
-> (tensor<i32>, tensor<i32>) {		-> (tensor<i32>, tensor<i32>) {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%0 = tensor.empty() : tensor<i32>		%0 = tensor.empty() : tensor<i32>
%1 = linalg.fill ins(%arg2 : i32) outs(%0 : tensor<i32>) -> tensor<i32>		%1 = linalg.fill ins(%arg2 : i32) outs(%0 : tensor<i32>)
%2 = tensor.empty() : tensor<i32>		%2 = tensor.empty() : tensor<i32>
%3 = linalg.fill ins(%arg2 : i32) outs(%2 : tensor<i32>) -> tensor<i32>		%3 = linalg.fill ins(%arg2 : i32) outs(%2 : tensor<i32>)
%4:2 = linalg.generic {		%4:2 = linalg.generic {
indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>, affine_map<(d0) -> ()>, affine_map<(d0) -> ()>],		indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>, affine_map<(d0) -> ()>, affine_map<(d0) -> ()>],
iterator_types = ["reduction"]}		iterator_types = ["reduction"]}
ins(%arg0, %arg1 : tensor<?xi32>, tensor<?xi32>)		ins(%arg0, %arg1 : tensor<?xi32>, tensor<?xi32>)
outs(%1, %3 : tensor<i32>, tensor<i32>) {		outs(%1, %3 : tensor<i32>, tensor<i32>) {
^bb0(%arg3: i32, %arg4: i32, %arg5: i32, %arg6: i32):		^bb0(%arg3: i32, %arg4: i32, %arg5: i32, %arg6: i32):
%5 = arith.cmpi sge, %arg3, %arg5 : i32		%5 = arith.cmpi sge, %arg3, %arg5 : i32
%6 = arith.select %5, %arg3, %arg5 : i32		%6 = arith.select %5, %arg3, %arg5 : i32
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	func.func @named_ops(%a3: memref<?x?x?xf32>, %b3: memref<?x?x?xf32>, %c3: memref<?x?x?xf32>,
%ta3: tensor<?x?x?xf32>, %tb3: tensor<?x?x?xf32>, %tc3: tensor<?x?x?xf32>)		%ta3: tensor<?x?x?xf32>, %tb3: tensor<?x?x?xf32>, %tc3: tensor<?x?x?xf32>)
-> (tensor<?x?x?xf32>)		-> (tensor<?x?x?xf32>)
{		{
linalg.batch_matmul ins(%a3, %b3: memref<?x?x?xf32>, memref<?x?x?xf32>)		linalg.batch_matmul ins(%a3, %b3: memref<?x?x?xf32>, memref<?x?x?xf32>)
outs(%c3: memref<?x?x?xf32>)		outs(%c3: memref<?x?x?xf32>)
%res1 = linalg.batch_matmul		%res1 = linalg.batch_matmul
ins(%ta3, %tb3: tensor<?x?x?xf32>, tensor<?x?x?xf32>)		ins(%ta3, %tb3: tensor<?x?x?xf32>, tensor<?x?x?xf32>)
outs(%tc3: tensor<?x?x?xf32>)		outs(%tc3: tensor<?x?x?xf32>)
-> tensor<?x?x?xf32>
return %res1 : tensor<?x?x?xf32>		return %res1 : tensor<?x?x?xf32>
}		}
// CHECK-LABEL: func @named_ops		// CHECK-LABEL: func @named_ops
// CHECK: linalg.batch_matmul		// CHECK: linalg.batch_matmul
// CHECK: linalg.batch_matmul		// CHECK: linalg.batch_matmul

// -----		// -----

func.func @fill_tensor(%arg0 : index, %arg1 : index, %arg2 : f32) -> tensor<?x?xf32> {		func.func @fill_tensor(%arg0 : index, %arg1 : index, %arg2 : f32) -> tensor<?x?xf32> {
%0 = tensor.empty(%arg0, %arg1) : tensor<?x?xf32>		%0 = tensor.empty(%arg0, %arg1) : tensor<?x?xf32>
%1 = linalg.fill ins(%arg2 : f32) outs(%0 : tensor<?x?xf32>) -> tensor<?x?xf32>		%1 = linalg.fill ins(%arg2 : f32) outs(%0 : tensor<?x?xf32>)
return %1 : tensor<?x?xf32>		return %1 : tensor<?x?xf32>
}		}
// CHECK: %{{.+}} = linalg.fill ins(%{{.+}} : f32) outs(%{{.+}} : tensor<?x?xf32>) -> tensor<?x?xf32>		// CHECK: %{{.+}} = linalg.fill ins(%{{.+}} : f32) outs(%{{.+}} : tensor<?x?xf32>)

// -----		// -----

func.func @mixed_parallel_reduced_results(%arg0 : tensor<?x?x?xf32>,		func.func @mixed_parallel_reduced_results(%arg0 : tensor<?x?x?xf32>,
%arg1 : tensor<?x?xf32>, %arg2 : tensor<?x?x?xf32>, %arg3 : tensor<?x?xf32>) ->		%arg1 : tensor<?x?xf32>, %arg2 : tensor<?x?x?xf32>, %arg3 : tensor<?x?xf32>) ->
(tensor<?x?x?xf32>, tensor<?x?xf32>) {		(tensor<?x?x?xf32>, tensor<?x?xf32>) {
%0:2 = linalg.generic {		%0:2 = linalg.generic {
indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1)>,		indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1)>,
▲ Show 20 Lines • Show All 289 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/swap-extract-slice-with-fill.mlir

	//RUN: mlir-opt -split-input-file -test-linalg-transform-patterns=test-swap-extract-slice-with-fill-pattern %s \| FileCheck %s			//RUN: mlir-opt -split-input-file -test-linalg-transform-patterns=test-swap-extract-slice-with-fill-pattern %s \| FileCheck %s

	// CHECK-LABEL: func.func @swap_fill_insert_slice			// CHECK-LABEL: func.func @swap_fill_insert_slice
	// CHECK-SAME: (%[[INIT:.+]]: tensor<?x?x?xf32>, %[[OFFSET0:.+]]: index, %[[SIZE1:.+]]: index)			// CHECK-SAME: (%[[INIT:.+]]: tensor<?x?x?xf32>, %[[OFFSET0:.+]]: index, %[[SIZE1:.+]]: index)
	// CHECK: %[[F0:.+]] = arith.constant 0.000000e+00 : f32			// CHECK: %[[F0:.+]] = arith.constant 0.000000e+00 : f32
	// CHECK: %[[EXT:.+]] = tensor.extract_slice %[[INIT]][%[[OFFSET0]], 8, 4] [1, %[[SIZE1]], 6] [1, 3, 1]			// CHECK: %[[EXT:.+]] = tensor.extract_slice %[[INIT]][%[[OFFSET0]], 8, 4] [1, %[[SIZE1]], 6] [1, 3, 1]
	// CHECK: %[[FILL:.+]] = linalg.fill ins(%[[F0]] : f32) outs(%[[EXT]] : tensor<?x6xf32>) -> tensor<?x6xf32>			// CHECK: %[[FILL:.+]] = linalg.fill ins(%[[F0]] : f32) outs(%[[EXT]] : tensor<?x6xf32>)
	// CHECK: return %[[FILL]]			// CHECK: return %[[FILL]]
	func.func @swap_fill_insert_slice(%init : tensor<?x?x?xf32>, %offset0: index, %size1: index) -> tensor<?x6xf32> {			func.func @swap_fill_insert_slice(%init : tensor<?x?x?xf32>, %offset0: index, %size1: index) -> tensor<?x6xf32> {
	%f0 = arith.constant 0.000000e+00 : f32			%f0 = arith.constant 0.000000e+00 : f32
	%0 = linalg.fill ins(%f0 : f32) outs(%init : tensor<?x?x?xf32>) -> tensor<?x?x?xf32>			%0 = linalg.fill ins(%f0 : f32) outs(%init : tensor<?x?x?xf32>)
	%1 = tensor.extract_slice %0[%offset0, 8, 4] [1, %size1, 6] [1, 3, 1]			%1 = tensor.extract_slice %0[%offset0, 8, 4] [1, %size1, 6] [1, 3, 1]
	: tensor<?x?x?xf32> to tensor<?x6xf32>			: tensor<?x?x?xf32> to tensor<?x6xf32>
	return %1: tensor<?x6xf32>			return %1: tensor<?x6xf32>
	}			}

	// -----			// -----

	// CHECK-LABEL: func.func @dont_swap_fill_insert_slice_multi_user			// CHECK-LABEL: func.func @dont_swap_fill_insert_slice_multi_user
	// CHECK: linalg.fill			// CHECK: linalg.fill
	// CHECK: tensor.extract_slice			// CHECK: tensor.extract_slice
	func.func @dont_swap_fill_insert_slice_multi_user(%init : tensor<?x?x?xf32>, %offset0: index, %size1: index) -> (tensor<?x?x?xf32>, tensor<2x?x6xf32>) {			func.func @dont_swap_fill_insert_slice_multi_user(%init : tensor<?x?x?xf32>, %offset0: index, %size1: index) -> (tensor<?x?x?xf32>, tensor<2x?x6xf32>) {
	%f0 = arith.constant 0.000000e+00 : f32			%f0 = arith.constant 0.000000e+00 : f32
	%0 = linalg.fill ins(%f0 : f32) outs(%init : tensor<?x?x?xf32>) -> tensor<?x?x?xf32>			%0 = linalg.fill ins(%f0 : f32) outs(%init : tensor<?x?x?xf32>)
	%1 = tensor.extract_slice %0[%offset0, 8, 4] [2, %size1, 6] [1, 3, 1]			%1 = tensor.extract_slice %0[%offset0, 8, 4] [2, %size1, 6] [1, 3, 1]
	: tensor<?x?x?xf32> to tensor<2x?x6xf32>			: tensor<?x?x?xf32> to tensor<2x?x6xf32>
	return %0, %1: tensor<?x?x?xf32>, tensor<2x?x6xf32>			return %0, %1: tensor<?x?x?xf32>, tensor<2x?x6xf32>
	}			}

mlir/test/Dialect/Linalg/tile-and-fuse-tensors.mlir

// RUN: mlir-opt %s -test-linalg-greedy-fusion -split-input-file \| FileCheck %s		// RUN: mlir-opt %s -test-linalg-greedy-fusion -split-input-file \| FileCheck %s

func.func @matmul_tensors(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>) -> tensor<?x?xf32> {		func.func @matmul_tensors(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>) -> tensor<?x?xf32> {
%t0 = linalg.matmul ins(%arg0, %arg1: tensor<?x?xf32>, tensor<?x?xf32>)		%t0 = linalg.matmul ins(%arg0, %arg1: tensor<?x?xf32>, tensor<?x?xf32>)
outs(%arg2: tensor<?x?xf32>)		outs(%arg2: tensor<?x?xf32>)
-> tensor<?x?xf32>

%c4 = arith.constant 4 : index		%c4 = arith.constant 4 : index
%c2 = arith.constant 2 : index		%c2 = arith.constant 2 : index
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%c3 = arith.constant 3 : index		%c3 = arith.constant 3 : index
%c1 = arith.constant 1 : index		%c1 = arith.constant 1 : index
%0 = tensor.dim %t0, %c0 : tensor<?x?xf32>		%0 = tensor.dim %t0, %c0 : tensor<?x?xf32>
%1 = tensor.dim %t0, %c1 : tensor<?x?xf32>		%1 = tensor.dim %t0, %c1 : tensor<?x?xf32>
%2 = tensor.dim %arg1, %c1 : tensor<?x?xf32>		%2 = tensor.dim %arg1, %c1 : tensor<?x?xf32>
%3 = scf.for %arg3 = %c0 to %0 step %c2 iter_args(%arg4 = %arg2) -> (tensor<?x?xf32>) {		%3 = scf.for %arg3 = %c0 to %0 step %c2 iter_args(%arg4 = %arg2) -> (tensor<?x?xf32>) {
%4 = scf.for %arg5 = %c0 to %2 step %c3 iter_args(%arg6 = %arg4) -> (tensor<?x?xf32>) {		%4 = scf.for %arg5 = %c0 to %2 step %c3 iter_args(%arg6 = %arg4) -> (tensor<?x?xf32>) {
%5 = scf.for %arg7 = %c0 to %1 step %c4 iter_args(%arg8 = %arg6) -> (tensor<?x?xf32>) {		%5 = scf.for %arg7 = %c0 to %1 step %c4 iter_args(%arg8 = %arg6) -> (tensor<?x?xf32>) {
%6 = tensor.extract_slice %t0[%arg3, %arg7][%c2, 4][1, 1] : tensor<?x?xf32> to tensor<?x4xf32>		%6 = tensor.extract_slice %t0[%arg3, %arg7][%c2, 4][1, 1] : tensor<?x?xf32> to tensor<?x4xf32>
%7 = tensor.extract_slice %arg1[%arg7, %arg5][4, %c3][1, 1] : tensor<?x?xf32> to tensor<4x?xf32>		%7 = tensor.extract_slice %arg1[%arg7, %arg5][4, %c3][1, 1] : tensor<?x?xf32> to tensor<4x?xf32>
%8 = tensor.extract_slice %arg8[%arg3, %arg5][%c2, %c3][1, 1] : tensor<?x?xf32> to tensor<?x?xf32>		%8 = tensor.extract_slice %arg8[%arg3, %arg5][%c2, %c3][1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
%9 = linalg.matmul ins(%6, %7 : tensor<?x4xf32>, tensor<4x?xf32>) outs(%8 : tensor<?x?xf32>) -> tensor<?x?xf32>		%9 = linalg.matmul ins(%6, %7 : tensor<?x4xf32>, tensor<4x?xf32>) outs(%8 : tensor<?x?xf32>)
%10 = tensor.insert_slice %9 into %arg8[%arg3, %arg5] [%c2, %c3] [1, 1] : tensor<?x?xf32> into tensor<?x?xf32>		%10 = tensor.insert_slice %9 into %arg8[%arg3, %arg5] [%c2, %c3] [1, 1] : tensor<?x?xf32> into tensor<?x?xf32>
scf.yield %10 : tensor<?x?xf32>		scf.yield %10 : tensor<?x?xf32>
}		}
scf.yield %5 : tensor<?x?xf32>		scf.yield %5 : tensor<?x?xf32>
}		}
scf.yield %4 : tensor<?x?xf32>		scf.yield %4 : tensor<?x?xf32>
}		}
return %3 : tensor<?x?xf32>		return %3 : tensor<?x?xf32>
Show All 14 Lines
// CHECK: scf.for %[[J:[0-9a-z]*]]		// CHECK: scf.for %[[J:[0-9a-z]*]]
// CHECK-NEXT: scf.for %[[K:[0-9a-z]]] {{.}} iter_args(%[[RES:[0-9a-z]*]]		// CHECK-NEXT: scf.for %[[K:[0-9a-z]]] {{.}} iter_args(%[[RES:[0-9a-z]*]]
// CHECK-DAG: %[[stB1:.*]] = tensor.extract_slice %[[B]][%[[K]], %[[J]]] [4, 3] [1, 1] : tensor<?x?xf32> to tensor<4x3xf32>		// CHECK-DAG: %[[stB1:.*]] = tensor.extract_slice %[[B]][%[[K]], %[[J]]] [4, 3] [1, 1] : tensor<?x?xf32> to tensor<4x3xf32>
// CHECK-DAG: %[[stF:.*]] = tensor.extract_slice %[[RES]][%[[I]], %[[J]]] [2, 3] [1, 1] : tensor<?x?xf32> to tensor<2x3xf32>		// CHECK-DAG: %[[stF:.*]] = tensor.extract_slice %[[RES]][%[[I]], %[[J]]] [2, 3] [1, 1] : tensor<?x?xf32> to tensor<2x3xf32>
//		//
// slices of the producing matmul.		// slices of the producing matmul.
// CHECK-DAG: %[[stB2:.*]] = tensor.extract_slice %[[B]][0, %[[K]]] [%[[dB0]], 4] [1, 1] : tensor<?x?xf32> to tensor<?x4xf32>		// CHECK-DAG: %[[stB2:.*]] = tensor.extract_slice %[[B]][0, %[[K]]] [%[[dB0]], 4] [1, 1] : tensor<?x?xf32> to tensor<?x4xf32>
// CHECK-DAG: %[[stC:.*]] = tensor.extract_slice %[[C]][%[[I]], %[[K]]] [2, 4] [1, 1] : tensor<?x?xf32> to tensor<2x4xf32>		// CHECK-DAG: %[[stC:.*]] = tensor.extract_slice %[[C]][%[[I]], %[[K]]] [2, 4] [1, 1] : tensor<?x?xf32> to tensor<2x4xf32>
// CHECK: %[[stD:.*]] = linalg.matmul ins(%[[stA]], %[[stB2]] : tensor<2x?xf32>, tensor<?x4xf32>) outs(%[[stC]] : tensor<2x4xf32>) -> tensor<2x4xf32>		// CHECK: %[[stD:.*]] = linalg.matmul ins(%[[stA]], %[[stB2]] : tensor<2x?xf32>, tensor<?x4xf32>) outs(%[[stC]] : tensor<2x4xf32>)
// CHECK-NEXT: %[[stG:.*]] = linalg.matmul ins(%[[stD]], %[[stB1]] : tensor<2x4xf32>, tensor<4x3xf32>) outs(%[[stF]] : tensor<2x3xf32>) -> tensor<2x3xf32>		// CHECK-NEXT: %[[stG:.*]] = linalg.matmul ins(%[[stD]], %[[stB1]] : tensor<2x4xf32>, tensor<4x3xf32>) outs(%[[stF]] : tensor<2x3xf32>)
// CHECK-NEXT: tensor.insert_slice %[[stG]] into %[[RES]][%[[I]], %[[J]]]		// CHECK-NEXT: tensor.insert_slice %[[stG]] into %[[RES]][%[[I]], %[[J]]]

// -----		// -----

func.func @conv_tensors_static(%input: tensor<1x225x225x3xf32>, %filter: tensor<3x3x3x32xf32>, %elementwise: tensor<1x112x112x32xf32>) -> tensor<1x112x112x32xf32> {		func.func @conv_tensors_static(%input: tensor<1x225x225x3xf32>, %filter: tensor<3x3x3x32xf32>, %elementwise: tensor<1x112x112x32xf32>) -> tensor<1x112x112x32xf32> {
%c112 = arith.constant 112 : index		%c112 = arith.constant 112 : index
%c32 = arith.constant 32 : index		%c32 = arith.constant 32 : index
%c16 = arith.constant 16 : index		%c16 = arith.constant 16 : index
%c8 = arith.constant 8 : index		%c8 = arith.constant 8 : index
%c4 = arith.constant 4 : index		%c4 = arith.constant 4 : index
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.0 : f32		%cst = arith.constant 0.0 : f32

%init = tensor.empty() : tensor<1x112x112x32xf32>		%init = tensor.empty() : tensor<1x112x112x32xf32>
%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x112x112x32xf32>) -> tensor<1x112x112x32xf32>		%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<1x112x112x32xf32>)

%conv = linalg.conv_2d_nhwc_hwcf		%conv = linalg.conv_2d_nhwc_hwcf
{dilations = dense<1> : tensor<2xi64>, strides = dense<2> : tensor<2xi64>}		{dilations = dense<1> : tensor<2xi64>, strides = dense<2> : tensor<2xi64>}
ins(%input, %filter : tensor<1x225x225x3xf32>, tensor<3x3x3x32xf32>)		ins(%input, %filter : tensor<1x225x225x3xf32>, tensor<3x3x3x32xf32>)
outs(%fill : tensor<1x112x112x32xf32>) -> tensor<1x112x112x32xf32>		outs(%fill : tensor<1x112x112x32xf32>)

%for0 = scf.for %iv0 = %c0 to %c112 step %c8 iter_args(%arg0 = %fill) -> tensor<1x112x112x32xf32> {		%for0 = scf.for %iv0 = %c0 to %c112 step %c8 iter_args(%arg0 = %fill) -> tensor<1x112x112x32xf32> {
%for1 = scf.for %iv1 = %c0 to %c112 step %c16 iter_args(%arg1 = %arg0) -> tensor<1x112x112x32xf32> {		%for1 = scf.for %iv1 = %c0 to %c112 step %c16 iter_args(%arg1 = %arg0) -> tensor<1x112x112x32xf32> {
%for2 = scf.for %iv2 = %c0 to %c32 step %c4 iter_args(%arg2 = %arg1) -> tensor<1x112x112x32xf32> {		%for2 = scf.for %iv2 = %c0 to %c32 step %c4 iter_args(%arg2 = %arg1) -> tensor<1x112x112x32xf32> {
%0 = tensor.extract_slice %conv[0, %iv0, %iv1, %iv2][1, 8, 16, 4][1, 1, 1, 1] : tensor<1x112x112x32xf32> to tensor<1x8x16x4xf32>		%0 = tensor.extract_slice %conv[0, %iv0, %iv1, %iv2][1, 8, 16, 4][1, 1, 1, 1] : tensor<1x112x112x32xf32> to tensor<1x8x16x4xf32>
%1 = tensor.extract_slice %elementwise[0, %iv0, %iv1, %iv2][1, 8, 16, 4][1, 1, 1, 1] : tensor<1x112x112x32xf32> to tensor<1x8x16x4xf32>		%1 = tensor.extract_slice %elementwise[0, %iv0, %iv1, %iv2][1, 8, 16, 4][1, 1, 1, 1] : tensor<1x112x112x32xf32> to tensor<1x8x16x4xf32>
%2 = tensor.extract_slice %arg2[0, %iv0, %iv1, %iv2][1, 8, 16, 4][1, 1, 1, 1] : tensor<1x112x112x32xf32> to tensor<1x8x16x4xf32>		%2 = tensor.extract_slice %arg2[0, %iv0, %iv1, %iv2][1, 8, 16, 4][1, 1, 1, 1] : tensor<1x112x112x32xf32> to tensor<1x8x16x4xf32>
%add = linalg.generic		%add = linalg.generic
Show All 22 Lines

// CHECK: #[[MAP0:.+]] = affine_map<(d0) -> (d0 * 2)>		// CHECK: #[[MAP0:.+]] = affine_map<(d0) -> (d0 * 2)>
// CHECK: #[[MAP1:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>		// CHECK: #[[MAP1:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>

// CHECK: func @conv_tensors_static		// CHECK: func @conv_tensors_static
// CHECK-SAME: (%[[INPUT:.+]]: tensor<1x225x225x3xf32>, %[[FILTER:.+]]: tensor<3x3x3x32xf32>, %[[ELEM:.+]]: tensor<1x112x112x32xf32>)		// CHECK-SAME: (%[[INPUT:.+]]: tensor<1x225x225x3xf32>, %[[FILTER:.+]]: tensor<3x3x3x32xf32>, %[[ELEM:.+]]: tensor<1x112x112x32xf32>)

// CHECK: %[[INIT:.+]] = tensor.empty() : tensor<1x112x112x32xf32>		// CHECK: %[[INIT:.+]] = tensor.empty() : tensor<1x112x112x32xf32>
// CHECK-NEXT: %[[FILL:.+]] = linalg.fill ins(%cst : f32) outs(%[[INIT]] : tensor<1x112x112x32xf32>) -> tensor<1x112x112x32xf32>		// CHECK-NEXT: %[[FILL:.+]] = linalg.fill ins(%cst : f32) outs(%[[INIT]] : tensor<1x112x112x32xf32>)

// CHECK-NEXT: scf.for %[[IV0:.+]] = %{{.+}} to %{{.+}} step %{{.+}} iter_args(%[[ARG0:.+]] = %[[FILL]])		// CHECK-NEXT: scf.for %[[IV0:.+]] = %{{.+}} to %{{.+}} step %{{.+}} iter_args(%[[ARG0:.+]] = %[[FILL]])
// CHECK-NEXT: %[[OFFSET_H:.+]] = affine.apply #[[MAP0]](%[[IV0]])		// CHECK-NEXT: %[[OFFSET_H:.+]] = affine.apply #[[MAP0]](%[[IV0]])
// CHECK-NEXT: scf.for %[[IV1:.+]] = %{{.+}} to %{{.+}} step %{{.+}} iter_args(%[[ARG1:.+]] = %[[ARG0]])		// CHECK-NEXT: scf.for %[[IV1:.+]] = %{{.+}} to %{{.+}} step %{{.+}} iter_args(%[[ARG1:.+]] = %[[ARG0]])
// CHECK-NEXT: %[[OFFSET_W:.+]] = affine.apply #[[MAP0]](%[[IV1]])		// CHECK-NEXT: %[[OFFSET_W:.+]] = affine.apply #[[MAP0]](%[[IV1]])
// CHECK-NEXT: %[[ST_INPUT:.+]] = tensor.extract_slice %arg0[0, %[[OFFSET_H]], %[[OFFSET_W]], 0] [1, 17, 33, 3] [1, 1, 1, 1] : tensor<1x225x225x3xf32> to tensor<1x17x33x3xf32>		// CHECK-NEXT: %[[ST_INPUT:.+]] = tensor.extract_slice %arg0[0, %[[OFFSET_H]], %[[OFFSET_W]], 0] [1, 17, 33, 3] [1, 1, 1, 1] : tensor<1x225x225x3xf32> to tensor<1x17x33x3xf32>
// CHECK-NEXT: scf.for %[[IV2:.+]] = %{{.+}} to %{{.+}} step %{{.+}} iter_args(%[[ARG2:.+]] = %[[ARG1]])		// CHECK-NEXT: scf.for %[[IV2:.+]] = %{{.+}} to %{{.+}} step %{{.+}} iter_args(%[[ARG2:.+]] = %[[ARG1]])
// CHECK-NEXT: %[[ST_ELEM:.+]] = tensor.extract_slice %[[ELEM]][0, %[[IV0]], %[[IV1]], %[[IV2]]] [1, 8, 16, 4] [1, 1, 1, 1] : tensor<1x112x112x32xf32> to tensor<1x8x16x4xf32>		// CHECK-NEXT: %[[ST_ELEM:.+]] = tensor.extract_slice %[[ELEM]][0, %[[IV0]], %[[IV1]], %[[IV2]]] [1, 8, 16, 4] [1, 1, 1, 1] : tensor<1x112x112x32xf32> to tensor<1x8x16x4xf32>
Show All 21 Lines	func.func @conv_tensors_dynamic(%input: tensor<?x?x?x?xf32>, %filter: tensor<?x?x?x?xf32>, %elementwise: tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32> {
%c16 = arith.constant 16 : index		%c16 = arith.constant 16 : index

%n = tensor.dim %elementwise, %c0 : tensor<?x?x?x?xf32>		%n = tensor.dim %elementwise, %c0 : tensor<?x?x?x?xf32>
%oh = tensor.dim %elementwise, %c1 : tensor<?x?x?x?xf32>		%oh = tensor.dim %elementwise, %c1 : tensor<?x?x?x?xf32>
%ow = tensor.dim %elementwise, %c2 : tensor<?x?x?x?xf32>		%ow = tensor.dim %elementwise, %c2 : tensor<?x?x?x?xf32>
%oc = tensor.dim %elementwise, %c3 : tensor<?x?x?x?xf32>		%oc = tensor.dim %elementwise, %c3 : tensor<?x?x?x?xf32>

%init = tensor.empty(%n, %oh, %ow, %oc) : tensor<?x?x?x?xf32>		%init = tensor.empty(%n, %oh, %ow, %oc) : tensor<?x?x?x?xf32>
%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32>		%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<?x?x?x?xf32>)

%conv = linalg.conv_2d_nhwc_hwcf		%conv = linalg.conv_2d_nhwc_hwcf
{dilations = dense<1> : tensor<2xi64>, strides = dense<2> : tensor<2xi64>}		{dilations = dense<1> : tensor<2xi64>, strides = dense<2> : tensor<2xi64>}
ins(%input, %filter : tensor<?x?x?x?xf32>, tensor<?x?x?x?xf32>)		ins(%input, %filter : tensor<?x?x?x?xf32>, tensor<?x?x?x?xf32>)
outs(%fill : tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32>		outs(%fill : tensor<?x?x?x?xf32>)

%for0 = scf.for %iv0 = %c0 to %n step %c8 iter_args(%arg0 = %fill) -> tensor<?x?x?x?xf32> {		%for0 = scf.for %iv0 = %c0 to %n step %c8 iter_args(%arg0 = %fill) -> tensor<?x?x?x?xf32> {
%for1 = scf.for %iv1 = %c0 to %oh step %c16 iter_args(%arg1 = %arg0) -> tensor<?x?x?x?xf32> {		%for1 = scf.for %iv1 = %c0 to %oh step %c16 iter_args(%arg1 = %arg0) -> tensor<?x?x?x?xf32> {
%for2 = scf.for %iv2 = %c0 to %ow step %c4 iter_args(%arg2 = %arg1) -> tensor<?x?x?x?xf32> {		%for2 = scf.for %iv2 = %c0 to %ow step %c4 iter_args(%arg2 = %arg1) -> tensor<?x?x?x?xf32> {
%for3 = scf.for %iv3 = %c0 to %oc step %c2 iter_args(%arg3 = %arg2) -> tensor<?x?x?x?xf32> {		%for3 = scf.for %iv3 = %c0 to %oc step %c2 iter_args(%arg3 = %arg2) -> tensor<?x?x?x?xf32> {
%n_size = affine.min affine_map<(d0)[s0] -> (8, -d0 + s0)>(%iv0)[%n]		%n_size = affine.min affine_map<(d0)[s0] -> (8, -d0 + s0)>(%iv0)[%n]
%oh_size = affine.min affine_map<(d0)[s0] -> (16, -d0 + s0)>(%iv1)[%oh]		%oh_size = affine.min affine_map<(d0)[s0] -> (16, -d0 + s0)>(%iv1)[%oh]
%ow_size = affine.min affine_map<(d0)[s0] -> (4, -d0 + s0)>(%iv2)[%ow]		%ow_size = affine.min affine_map<(d0)[s0] -> (4, -d0 + s0)>(%iv2)[%ow]
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
// CHECK-DAG: %[[C3:.+]] = arith.constant 3 : index		// CHECK-DAG: %[[C3:.+]] = arith.constant 3 : index

// CHECK-DAG: %[[ELEM_N:.+]] = tensor.dim %[[ELEM]], %[[C0]] : tensor<?x?x?x?xf32>		// CHECK-DAG: %[[ELEM_N:.+]] = tensor.dim %[[ELEM]], %[[C0]] : tensor<?x?x?x?xf32>
// CHECK-DAG: %[[ELEM_OH:.+]] = tensor.dim %[[ELEM]], %[[C1]] : tensor<?x?x?x?xf32>		// CHECK-DAG: %[[ELEM_OH:.+]] = tensor.dim %[[ELEM]], %[[C1]] : tensor<?x?x?x?xf32>
// CHECK-DAG: %[[ELEM_OW:.+]] = tensor.dim %[[ELEM]], %[[C2]] : tensor<?x?x?x?xf32>		// CHECK-DAG: %[[ELEM_OW:.+]] = tensor.dim %[[ELEM]], %[[C2]] : tensor<?x?x?x?xf32>
// CHECK-DAG: %[[ELEM_OC:.+]] = tensor.dim %[[ELEM]], %[[C3]] : tensor<?x?x?x?xf32>		// CHECK-DAG: %[[ELEM_OC:.+]] = tensor.dim %[[ELEM]], %[[C3]] : tensor<?x?x?x?xf32>

// CHECK: %[[INIT:.+]] = tensor.empty(%[[ELEM_N]], %[[ELEM_OH]], %[[ELEM_OW]], %[[ELEM_OC]]) : tensor<?x?x?x?xf32>		// CHECK: %[[INIT:.+]] = tensor.empty(%[[ELEM_N]], %[[ELEM_OH]], %[[ELEM_OW]], %[[ELEM_OC]]) : tensor<?x?x?x?xf32>
// CHECK: %[[FILL:.+]] = linalg.fill ins(%cst : f32) outs(%[[INIT]] : tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32>		// CHECK: %[[FILL:.+]] = linalg.fill ins(%cst : f32) outs(%[[INIT]] : tensor<?x?x?x?xf32>)

// CHECK-DAG: %[[FILTER_H:.+]] = tensor.dim %[[FILTER]], %[[C0]] : tensor<?x?x?x?xf32>		// CHECK-DAG: %[[FILTER_H:.+]] = tensor.dim %[[FILTER]], %[[C0]] : tensor<?x?x?x?xf32>
// CHECK-DAG: %[[FILTER_W:.+]] = tensor.dim %[[FILTER]], %[[C1]] : tensor<?x?x?x?xf32>		// CHECK-DAG: %[[FILTER_W:.+]] = tensor.dim %[[FILTER]], %[[C1]] : tensor<?x?x?x?xf32>
// CHECK-DAG: %[[FILTER_IC:.+]] = tensor.dim %[[FILTER]], %[[C2]] : tensor<?x?x?x?xf32>		// CHECK-DAG: %[[FILTER_IC:.+]] = tensor.dim %[[FILTER]], %[[C2]] : tensor<?x?x?x?xf32>
// CHECK-DAG: %[[FILTER_OC:.+]] = tensor.dim %[[FILTER]], %[[C3]] : tensor<?x?x?x?xf32>		// CHECK-DAG: %[[FILTER_OC:.+]] = tensor.dim %[[FILTER]], %[[C3]] : tensor<?x?x?x?xf32>
// CHECK-DAG: %[[INPUT_N:.+]] = tensor.dim %[[INPUT]], %[[C0]] : tensor<?x?x?x?xf32>		// CHECK-DAG: %[[INPUT_N:.+]] = tensor.dim %[[INPUT]], %[[C0]] : tensor<?x?x?x?xf32>
// CHECK-DAG: %[[INPUT_C:.+]] = tensor.dim %[[INPUT]], %[[C3]] : tensor<?x?x?x?xf32>		// CHECK-DAG: %[[INPUT_C:.+]] = tensor.dim %[[INPUT]], %[[C3]] : tensor<?x?x?x?xf32>
// CHECK-DAG: %[[FILL_H:.+]] = tensor.dim %[[FILL]], %[[C1]] : tensor<?x?x?x?xf32>		// CHECK-DAG: %[[FILL_H:.+]] = tensor.dim %[[FILL]], %[[C1]] : tensor<?x?x?x?xf32>
Show All 22 Lines
// CHECK-SAME: [%[[SIZE_ELEM_N]], %[[SIZE_ELEM_OH]], %[[SIZE_ELEM_OW]], %[[SIZE_ELEM_OC]]]		// CHECK-SAME: [%[[SIZE_ELEM_N]], %[[SIZE_ELEM_OH]], %[[SIZE_ELEM_OW]], %[[SIZE_ELEM_OC]]]
// CHECK-NEXT: %[[SIZE_ELEM_OC_2:.+]] = affine.min #[[BOUND2_MAP_2]](%[[IV3]], %[[IV2]])[%[[FILTER_OC]], %[[ELEM_OC]]]		// CHECK-NEXT: %[[SIZE_ELEM_OC_2:.+]] = affine.min #[[BOUND2_MAP_2]](%[[IV3]], %[[IV2]])[%[[FILTER_OC]], %[[ELEM_OC]]]
// CHECK-NEXT: %[[ST_FILTER:.+]] = tensor.extract_slice %[[FILTER]][0, 0, 0, %[[IV3]]]		// CHECK-NEXT: %[[ST_FILTER:.+]] = tensor.extract_slice %[[FILTER]][0, 0, 0, %[[IV3]]]
// CHECK-SAME: [%[[FILTER_H]], %[[FILTER_W]], %[[FILTER_IC]], %[[SIZE_ELEM_OC_2]]]		// CHECK-SAME: [%[[FILTER_H]], %[[FILTER_W]], %[[FILTER_IC]], %[[SIZE_ELEM_OC_2]]]
// CHECK-NEXT: %[[ST_FILL:.+]] = tensor.extract_slice %[[FILL]][%[[IV0]], %[[IV1]], %[[IV2]], %[[IV3]]]		// CHECK-NEXT: %[[ST_FILL:.+]] = tensor.extract_slice %[[FILL]][%[[IV0]], %[[IV1]], %[[IV2]], %[[IV3]]]
// CHECK-SAME: [%[[SIZE_INPUT_N]], %[[SIZE_ELEM_OH_2]], %[[SIZE_ELEM_OW_2]], %[[SIZE_ELEM_OC_2]]]		// CHECK-SAME: [%[[SIZE_INPUT_N]], %[[SIZE_ELEM_OH_2]], %[[SIZE_ELEM_OW_2]], %[[SIZE_ELEM_OC_2]]]
// CHECK-NEXT: %[[ST_CONV:.+]] = linalg.conv_2d_nhwc_hwcf		// CHECK-NEXT: %[[ST_CONV:.+]] = linalg.conv_2d_nhwc_hwcf
// CHECK-SAME: ins(%[[ST_INPUT]], %[[ST_FILTER]] : tensor<?x?x?x?xf32>, tensor<?x?x?x?xf32>)		// CHECK-SAME: ins(%[[ST_INPUT]], %[[ST_FILTER]] : tensor<?x?x?x?xf32>, tensor<?x?x?x?xf32>)
// CHECK-SAME: outs(%[[ST_FILL]] : tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32>		// CHECK-SAME: outs(%[[ST_FILL]] : tensor<?x?x?x?xf32>)
// CHECK-NEXT: %[[ST_ADD:.+]] = linalg.generic		// CHECK-NEXT: %[[ST_ADD:.+]] = linalg.generic
// CHECK-SAME: ins(%[[ST_CONV]], %[[ST_ELEM]] : tensor<?x?x?x?xf32>, tensor<?x?x?x?xf32>)		// CHECK-SAME: ins(%[[ST_CONV]], %[[ST_ELEM]] : tensor<?x?x?x?xf32>, tensor<?x?x?x?xf32>)
// CHECK-SAME: outs(%[[ST_ARG]] : tensor<?x?x?x?xf32>)		// CHECK-SAME: outs(%[[ST_ARG]] : tensor<?x?x?x?xf32>)
// CHECK: tensor.insert_slice %[[ST_ADD]] into %[[ARG]][%[[IV0]], %[[IV1]], %[[IV2]], %[[IV3]]]		// CHECK: tensor.insert_slice %[[ST_ADD]] into %[[ARG]][%[[IV0]], %[[IV1]], %[[IV2]], %[[IV3]]]
// CHECK-SAME: [%[[SIZE_ELEM_N]], %[[SIZE_ELEM_OH]], %[[SIZE_ELEM_OW]], %[[SIZE_ELEM_OC]]]		// CHECK-SAME: [%[[SIZE_ELEM_N]], %[[SIZE_ELEM_OH]], %[[SIZE_ELEM_OW]], %[[SIZE_ELEM_OC]]]

// -----		// -----

Show All 28 Lines	func.func @pad_generic_static(%small_input: tensor<58x1xf32>, %large_input: tensor<64x128xf32>) -> tensor<64x128xf32> {
%d0 = tensor.dim %large_input, %c0 : tensor<64x128xf32>		%d0 = tensor.dim %large_input, %c0 : tensor<64x128xf32>
%d1 = tensor.dim %large_input, %c1 : tensor<64x128xf32>		%d1 = tensor.dim %large_input, %c1 : tensor<64x128xf32>

%pad = tensor.pad %small_input low[4, 60] high[2, 67] {		%pad = tensor.pad %small_input low[4, 60] high[2, 67] {
^bb0(%arg0: index, %arg1: index):		^bb0(%arg0: index, %arg1: index):
tensor.yield %zero : f32		tensor.yield %zero : f32
} : tensor<58x1xf32> to tensor<64x128xf32>		} : tensor<58x1xf32> to tensor<64x128xf32>

%fill = linalg.fill ins(%zero : f32) outs(%large_input : tensor<64x128xf32>) -> tensor<64x128xf32>		%fill = linalg.fill ins(%zero : f32) outs(%large_input : tensor<64x128xf32>)

%for0 = scf.for %iv0 = %c0 to %d0 step %c16 iter_args(%arg0 = %fill) -> tensor<64x128xf32> {		%for0 = scf.for %iv0 = %c0 to %d0 step %c16 iter_args(%arg0 = %fill) -> tensor<64x128xf32> {
%for1 = scf.for %iv1 = %c0 to %d1 step %c32 iter_args(%arg1 = %arg0) -> tensor<64x128xf32> {		%for1 = scf.for %iv1 = %c0 to %d1 step %c32 iter_args(%arg1 = %arg0) -> tensor<64x128xf32> {
%0 = tensor.extract_slice %pad[%iv0, %iv1][16, 32][1, 1] : tensor<64x128xf32> to tensor<16x32xf32>		%0 = tensor.extract_slice %pad[%iv0, %iv1][16, 32][1, 1] : tensor<64x128xf32> to tensor<16x32xf32>
%1 = tensor.extract_slice %large_input[%iv0, %iv1][16, 32][1, 1] : tensor<64x128xf32> to tensor<16x32xf32>		%1 = tensor.extract_slice %large_input[%iv0, %iv1][16, 32][1, 1] : tensor<64x128xf32> to tensor<16x32xf32>
%2 = tensor.extract_slice %arg1[%iv0, %iv1][16, 32][1, 1] : tensor<64x128xf32> to tensor<16x32xf32>		%2 = tensor.extract_slice %arg1[%iv0, %iv1][16, 32][1, 1] : tensor<64x128xf32> to tensor<16x32xf32>

%add = linalg.generic		%add = linalg.generic
Show All 14 Lines

mlir/test/Dialect/Linalg/tile-tensors.mlir

	// RUN: mlir-opt %s -test-transform-dialect-interpreter -split-input-file \| FileCheck %s			// RUN: mlir-opt %s -test-transform-dialect-interpreter -split-input-file \| FileCheck %s

	// CHECK-LABEL: func @matmul_tensors(			// CHECK-LABEL: func @matmul_tensors(
	// CHECK-SAME: %[[TA:[0-9a-z]+]]: tensor<?x?xf32>			// CHECK-SAME: %[[TA:[0-9a-z]+]]: tensor<?x?xf32>
	// CHECK-SAME: %[[TB:[0-9a-z]+]]: tensor<?x?xf32>			// CHECK-SAME: %[[TB:[0-9a-z]+]]: tensor<?x?xf32>
	// CHECK-SAME: %[[TC:[0-9a-z]+]]: tensor<?x?xf32>) -> tensor<?x?xf32> {			// CHECK-SAME: %[[TC:[0-9a-z]+]]: tensor<?x?xf32>) -> tensor<?x?xf32> {
	func.func @matmul_tensors(			func.func @matmul_tensors(
	%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>)			%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>)
	-> tensor<?x?xf32> {			-> tensor<?x?xf32> {
	// CHECK: %[[TD0:.]] = scf.for {{.}} to {{.}} step {{.}} iter_args(%[[TC0:.*]] = %[[TC]]) -> (tensor<?x?xf32>) {			// CHECK: %[[TD0:.]] = scf.for {{.}} to {{.}} step {{.}} iter_args(%[[TC0:.*]] = %[[TC]]) -> (tensor<?x?xf32>) {
	// CHECK: %[[TD1:.]] = scf.for {{.}} to {{.}} step {{.}} iter_args(%[[TC1:.*]] = %[[TC0]]) -> (tensor<?x?xf32>) {			// CHECK: %[[TD1:.]] = scf.for {{.}} to {{.}} step {{.}} iter_args(%[[TC1:.*]] = %[[TC0]]) -> (tensor<?x?xf32>) {
	// CHECK: %[[TD2:.]] = scf.for {{.}} to {{.}} step {{.}} iter_args(%[[TC2:.*]] = %[[TC1]]) -> (tensor<?x?xf32>) {			// CHECK: %[[TD2:.]] = scf.for {{.}} to {{.}} step {{.}} iter_args(%[[TC2:.*]] = %[[TC1]]) -> (tensor<?x?xf32>) {
	// CHECK: %[[sTA:.]] = tensor.extract_slice %[[TA]][{{.}}] : tensor<?x?xf32> to tensor<?x?xf32>			// CHECK: %[[sTA:.]] = tensor.extract_slice %[[TA]][{{.}}] : tensor<?x?xf32> to tensor<?x?xf32>
	// CHECK: %[[sTB:.]] = tensor.extract_slice %[[TB]][{{.}}] : tensor<?x?xf32> to tensor<?x?xf32>			// CHECK: %[[sTB:.]] = tensor.extract_slice %[[TB]][{{.}}] : tensor<?x?xf32> to tensor<?x?xf32>
	// CHECK: %[[sTC:.]] = tensor.extract_slice %[[TC2]][{{.}}] : tensor<?x?xf32> to tensor<?x?xf32>			// CHECK: %[[sTC:.]] = tensor.extract_slice %[[TC2]][{{.}}] : tensor<?x?xf32> to tensor<?x?xf32>
	// CHECK: %[[sTD:.*]] = linalg.matmul ins(%[[sTA]], %[[sTB]] : tensor<?x?xf32>, tensor<?x?xf32>)			// CHECK: %[[sTD:.*]] = linalg.matmul ins(%[[sTA]], %[[sTB]] : tensor<?x?xf32>, tensor<?x?xf32>)
	// CHECK-SAME: outs(%[[sTC]] : tensor<?x?xf32>) -> tensor<?x?xf32>			// CHECK-SAME: outs(%[[sTC]] : tensor<?x?xf32>)
	// CHECK: %[[TD:.]] = tensor.insert_slice %[[sTD]] into %[[TC2]][{{.}}] : tensor<?x?xf32> into tensor<?x?xf32>			// CHECK: %[[TD:.]] = tensor.insert_slice %[[sTD]] into %[[TC2]][{{.}}] : tensor<?x?xf32> into tensor<?x?xf32>
	// CHECK: scf.yield %[[TD]] : tensor<?x?xf32>			// CHECK: scf.yield %[[TD]] : tensor<?x?xf32>
	// CHECK: scf.yield %[[TD2]] : tensor<?x?xf32>			// CHECK: scf.yield %[[TD2]] : tensor<?x?xf32>
	// CHECK: scf.yield %[[TD1]] : tensor<?x?xf32>			// CHECK: scf.yield %[[TD1]] : tensor<?x?xf32>
	%0 = linalg.matmul ins(%arg0, %arg1: tensor<?x?xf32>, tensor<?x?xf32>)			%0 = linalg.matmul ins(%arg0, %arg1: tensor<?x?xf32>, tensor<?x?xf32>)
	outs(%arg2: tensor<?x?xf32>)			outs(%arg2: tensor<?x?xf32>)
	-> tensor<?x?xf32>

	// CHECK: return %[[TD0]] : tensor<?x?xf32>			// CHECK: return %[[TD0]] : tensor<?x?xf32>
	return %0 : tensor<?x?xf32>			return %0 : tensor<?x?xf32>
	}			}

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb0(%arg1: !pdl.operation):			^bb0(%arg1: !pdl.operation):
	%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1			%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1
	▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/tile-to-foreach-thread.mlir

Show All 15 Lines	// CHECK-SAME: %[[C:[0-9a-z]+]]: tensor<?x?xf32>
// CHECK-DAG: %[[C10:.*]] = arith.constant 10 : index		// CHECK-DAG: %[[C10:.*]] = arith.constant 10 : index
// CHECK-DAG: %[[C20:.*]] = arith.constant 20 : index		// CHECK-DAG: %[[C20:.*]] = arith.constant 20 : index
// CHECK: scf.foreach_thread ({{.}}) in (%[[C10]], %[[C20]]) shared_outs(%[[C_BLK:.]] = %[[C]]) -> (tensor<?x?xf32>) {		// CHECK: scf.foreach_thread ({{.}}) in (%[[C10]], %[[C20]]) shared_outs(%[[C_BLK:.]] = %[[C]]) -> (tensor<?x?xf32>) {
// CHECK: %[[tA:.]] = tensor.extract_slice %[[A]]{{.}} : tensor<?x?xf32> to tensor<?x?xf32>		// CHECK: %[[tA:.]] = tensor.extract_slice %[[A]]{{.}} : tensor<?x?xf32> to tensor<?x?xf32>
// CHECK: %[[tB:.]] = tensor.extract_slice %[[B]]{{.}} : tensor<?x?xf32> to tensor<?x?xf32>		// CHECK: %[[tB:.]] = tensor.extract_slice %[[B]]{{.}} : tensor<?x?xf32> to tensor<?x?xf32>
// CHECK: %[[tC:.]] = tensor.extract_slice %[[C_BLK]]{{.}} : tensor<?x?xf32> to tensor<?x?xf32>		// CHECK: %[[tC:.]] = tensor.extract_slice %[[C_BLK]]{{.}} : tensor<?x?xf32> to tensor<?x?xf32>
// CHECK: %[[RES:.*]] = linalg.matmul		// CHECK: %[[RES:.*]] = linalg.matmul
// CHECK-SAME: ins(%[[tA]], %[[tB]] : tensor<?x?xf32>, tensor<?x?xf32>)		// CHECK-SAME: ins(%[[tA]], %[[tB]] : tensor<?x?xf32>, tensor<?x?xf32>)
// CHECK-SAME: outs(%[[tC]] : tensor<?x?xf32>) -> tensor<?x?xf32>		// CHECK-SAME: outs(%[[tC]] : tensor<?x?xf32>)
// CHECK: scf.foreach_thread.perform_concurrently {		// CHECK: scf.foreach_thread.perform_concurrently {
// CHECK-NEXT: tensor.parallel_insert_slice %[[RES]] into %[[C_BLK]]{{.*}} :		// CHECK-NEXT: tensor.parallel_insert_slice %[[RES]] into %[[C_BLK]]{{.*}} :
// CHECK-SAME: tensor<?x?xf32> into tensor<?x?xf32>		// CHECK-SAME: tensor<?x?xf32> into tensor<?x?xf32>
// CHECK-NEXT: }		// CHECK-NEXT: }
// CHECK-NEXT: } {mapping = [#gpu.thread<y>, #gpu.thread<x>]}		// CHECK-NEXT: } {mapping = [#gpu.thread<y>, #gpu.thread<x>]}
%0 = linalg.matmul ins(%A, %B : tensor<?x?xf32>, tensor<?x?xf32>)		%0 = linalg.matmul ins(%A, %B : tensor<?x?xf32>, tensor<?x?xf32>)
outs(%C : tensor<?x?xf32>) -> (tensor<?x?xf32>)		outs(%C : tensor<?x?xf32>)
return %0 : tensor<?x?xf32>		return %0 : tensor<?x?xf32>
}		}

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb1(%arg1: !pdl.operation):		^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1		%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1
%1:2 = transform.structured.tile_to_foreach_thread_op %0 num_threads [10, 20] (mapping = [ #gpu.thread<y>, #gpu.thread<x> ] )		%1:2 = transform.structured.tile_to_foreach_thread_op %0 num_threads [10, 20] (mapping = [ #gpu.thread<y>, #gpu.thread<x> ] )
}		}
Show All 25 Lines	func.func @matmul_tile_size_dynamic_dynamic(%A: tensor<?x?xf32>, %B: tensor<?x?xf32>, %C: tensor<?x?xf32>) -> tensor<?x?xf32> {
// CHECK: tensor.extract_slice %[[B]]		// CHECK: tensor.extract_slice %[[B]]
// CHECK: tensor.extract_slice %[[C_BLK]]		// CHECK: tensor.extract_slice %[[C_BLK]]
// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK: scf.foreach_thread.perform_concurrently		// CHECK: scf.foreach_thread.perform_concurrently
// CHECK-NEXT: tensor.parallel_insert_slice		// CHECK-NEXT: tensor.parallel_insert_slice
%tile_size_1 = "test.dummy"() : () -> (index)		%tile_size_1 = "test.dummy"() : () -> (index)
%tile_size_2 = "test.dummy"() : () -> (index)		%tile_size_2 = "test.dummy"() : () -> (index)
%0 = linalg.matmul ins(%A, %B : tensor<?x?xf32>, tensor<?x?xf32>)		%0 = linalg.matmul ins(%A, %B : tensor<?x?xf32>, tensor<?x?xf32>)
outs(%C : tensor<?x?xf32>) -> (tensor<?x?xf32>)		outs(%C : tensor<?x?xf32>)
return %0 : tensor<?x?xf32>		return %0 : tensor<?x?xf32>
}		}

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb1(%arg1: !pdl.operation):		^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1		%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1
%sz = transform.structured.match ops{["test.dummy"]} in %arg1		%sz = transform.structured.match ops{["test.dummy"]} in %arg1
%1:2 = transform.structured.tile_to_foreach_thread_op %0 tile_sizes %sz		%1:2 = transform.structured.tile_to_foreach_thread_op %0 tile_sizes %sz
Show All 24 Lines	func.func @matmul_static(%A: tensor<100x200xf32>, %B: tensor<200x300xf32>, %C: tensor<100x300xf32>) -> tensor<100x300xf32> {
// CHECK: %[[LB1:.+]] = affine.apply #[[$map3]](%[[IV1]])		// CHECK: %[[LB1:.+]] = affine.apply #[[$map3]](%[[IV1]])
// CHECK: %[[tA:.+]] = tensor.extract_slice %[[A]][%[[LB0]], 0] [10, 200] [1, 1] :		// CHECK: %[[tA:.+]] = tensor.extract_slice %[[A]][%[[LB0]], 0] [10, 200] [1, 1] :
// CHECK: %[[tB:.+]] = tensor.extract_slice %[[B]][0, %[[LB1]]] [200, %[[TS]]] [1, 1] :		// CHECK: %[[tB:.+]] = tensor.extract_slice %[[B]][0, %[[LB1]]] [200, %[[TS]]] [1, 1] :
// CHECK: %[[tC:.+]] = tensor.extract_slice %[[C_BLK]][%[[LB0]], %[[LB1]]] [10, %[[TS]]] [1, 1] :		// CHECK: %[[tC:.+]] = tensor.extract_slice %[[C_BLK]][%[[LB0]], %[[LB1]]] [10, %[[TS]]] [1, 1] :
// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK: scf.foreach_thread.perform_concurrently		// CHECK: scf.foreach_thread.perform_concurrently
// CHECK-NEXT: tensor.parallel_insert_slice		// CHECK-NEXT: tensor.parallel_insert_slice
%0 = linalg.matmul ins(%A, %B : tensor<100x200xf32>, tensor<200x300xf32>)		%0 = linalg.matmul ins(%A, %B : tensor<100x200xf32>, tensor<200x300xf32>)
outs(%C : tensor<100x300xf32>) -> (tensor<100x300xf32>)		outs(%C : tensor<100x300xf32>)
return %0 : tensor<100x300xf32>		return %0 : tensor<100x300xf32>
}		}

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb1(%arg1: !pdl.operation):		^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1		%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1
%1:2 = transform.structured.tile_to_foreach_thread_op %0 num_threads [10, 21]		%1:2 = transform.structured.tile_to_foreach_thread_op %0 num_threads [10, 21]
}		}
Show All 24 Lines	func.func @matmul_tile_size_dynamic(%A: tensor<?x?xf32>, %B: tensor<?x?xf32>, %C: tensor<?x?xf32>) -> tensor<?x?xf32> {
// CHECK: %[[LB1:.+]] = affine.apply #[[$map6]](%[[IV1]])		// CHECK: %[[LB1:.+]] = affine.apply #[[$map6]](%[[IV1]])
// CHECK: tensor.extract_slice %[[A]]		// CHECK: tensor.extract_slice %[[A]]
// CHECK: tensor.extract_slice %[[B]]		// CHECK: tensor.extract_slice %[[B]]
// CHECK: tensor.extract_slice %[[C_BLK]]		// CHECK: tensor.extract_slice %[[C_BLK]]
// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK: scf.foreach_thread.perform_concurrently		// CHECK: scf.foreach_thread.perform_concurrently
// CHECK-NEXT: tensor.parallel_insert_slice		// CHECK-NEXT: tensor.parallel_insert_slice
%0 = linalg.matmul ins(%A, %B : tensor<?x?xf32>, tensor<?x?xf32>)		%0 = linalg.matmul ins(%A, %B : tensor<?x?xf32>, tensor<?x?xf32>)
outs(%C : tensor<?x?xf32>) -> (tensor<?x?xf32>)		outs(%C : tensor<?x?xf32>)
return %0 : tensor<?x?xf32>		return %0 : tensor<?x?xf32>
}		}

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb1(%arg1: !pdl.operation):		^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1		%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1
%1:2 = transform.structured.tile_to_foreach_thread_op %0 tile_sizes [10, 20]		%1:2 = transform.structured.tile_to_foreach_thread_op %0 tile_sizes [10, 20]
}		}
Show All 21 Lines	func.func @matmul_tile_size_static(%A: tensor<100x200xf32>, %B: tensor<200x300xf32>, %C: tensor<100x300xf32>) -> tensor<100x300xf32> {
// CHECK: %[[LB1:.+]] = affine.apply #[[$map3]](%[[IV1]])		// CHECK: %[[LB1:.+]] = affine.apply #[[$map3]](%[[IV1]])
// CHECK: %[[tA:.+]] = tensor.extract_slice %[[A]][%[[LB0]], 0] [10, 200] [1, 1] :		// CHECK: %[[tA:.+]] = tensor.extract_slice %[[A]][%[[LB0]], 0] [10, 200] [1, 1] :
// CHECK: %[[tB:.+]] = tensor.extract_slice %[[B]][0, %[[LB1]]] [200, %[[TS]]] [1, 1] :		// CHECK: %[[tB:.+]] = tensor.extract_slice %[[B]][0, %[[LB1]]] [200, %[[TS]]] [1, 1] :
// CHECK: %[[tC:.+]] = tensor.extract_slice %[[C_BLK]][%[[LB0]], %[[LB1]]] [10, %[[TS]]] [1, 1] :		// CHECK: %[[tC:.+]] = tensor.extract_slice %[[C_BLK]][%[[LB0]], %[[LB1]]] [10, %[[TS]]] [1, 1] :
// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK: scf.foreach_thread.perform_concurrently		// CHECK: scf.foreach_thread.perform_concurrently
// CHECK-NEXT: tensor.parallel_insert_slice		// CHECK-NEXT: tensor.parallel_insert_slice
%0 = linalg.matmul ins(%A, %B : tensor<100x200xf32>, tensor<200x300xf32>)		%0 = linalg.matmul ins(%A, %B : tensor<100x200xf32>, tensor<200x300xf32>)
outs(%C : tensor<100x300xf32>) -> (tensor<100x300xf32>)		outs(%C : tensor<100x300xf32>)
return %0 : tensor<100x300xf32>		return %0 : tensor<100x300xf32>
}		}

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb1(%arg1: !pdl.operation):		^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1		%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1
%1:2 = transform.structured.tile_to_foreach_thread_op %0 tile_sizes [10, 21]		%1:2 = transform.structured.tile_to_foreach_thread_op %0 tile_sizes [10, 21]
}		}
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	func.func @matmul_tile_size_dynamic_dynamic(%A: tensor<?x?xf32>, %B: tensor<?x?xf32>, %C: tensor<?x?xf32>) -> tensor<?x?xf32> {
// CHECK: tensor.extract_slice %[[A]]		// CHECK: tensor.extract_slice %[[A]]
// CHECK: tensor.extract_slice %[[B]]		// CHECK: tensor.extract_slice %[[B]]
// CHECK: tensor.extract_slice %[[C_BLK]]		// CHECK: tensor.extract_slice %[[C_BLK]]
// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK: scf.foreach_thread.perform_concurrently		// CHECK: scf.foreach_thread.perform_concurrently
// CHECK-NEXT: tensor.parallel_insert_slice		// CHECK-NEXT: tensor.parallel_insert_slice
%tile_size = "test.dummy"() : () -> (index)		%tile_size = "test.dummy"() : () -> (index)
%0 = linalg.matmul ins(%A, %B : tensor<?x?xf32>, tensor<?x?xf32>)		%0 = linalg.matmul ins(%A, %B : tensor<?x?xf32>, tensor<?x?xf32>)
outs(%C : tensor<?x?xf32>) -> (tensor<?x?xf32>)		outs(%C : tensor<?x?xf32>)
return %0 : tensor<?x?xf32>		return %0 : tensor<?x?xf32>
}		}

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb1(%arg1: !pdl.operation):		^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1		%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1
%sz = transform.structured.match ops{["test.dummy"]} in %arg1		%sz = transform.structured.match ops{["test.dummy"]} in %arg1
%1:2 = transform.structured.tile_to_foreach_thread_op %0 tile_sizes [%sz, 20]		%1:2 = transform.structured.tile_to_foreach_thread_op %0 tile_sizes [%sz, 20]
▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/transform-op-decompose.mlir

	// RUN: mlir-opt --test-transform-dialect-interpreter --split-input-file %s \| FileCheck %s			// RUN: mlir-opt --test-transform-dialect-interpreter --split-input-file %s \| FileCheck %s

	// CHECK-LABEL: @conv_2d_nhwc_hwcf			// CHECK-LABEL: @conv_2d_nhwc_hwcf
	// CHECK-SAME: %[[ARG0:.+]]: tensor<?x1x?x?xf32>,			// CHECK-SAME: %[[ARG0:.+]]: tensor<?x1x?x?xf32>,
	// CHECK-SAME: %[[ARG1:.+]]: tensor<1x?x?x?xf32>			// CHECK-SAME: %[[ARG1:.+]]: tensor<1x?x?x?xf32>
	// CHECK-SAME: %[[ARG2:.+]]: tensor<?x1x?x?xf32>			// CHECK-SAME: %[[ARG2:.+]]: tensor<?x1x?x?xf32>
	func.func @conv_2d_nhwc_hwcf(%input: tensor<?x1x?x?xf32>, %filter: tensor<1x?x?x?xf32>, %init: tensor<?x1x?x?xf32>) -> tensor<?x1x?x?xf32> {			func.func @conv_2d_nhwc_hwcf(%input: tensor<?x1x?x?xf32>, %filter: tensor<1x?x?x?xf32>, %init: tensor<?x1x?x?xf32>) -> tensor<?x1x?x?xf32> {
	// CHECK: %[[SLICE0:.+]] = tensor.extract_slice %[[ARG0]]			// CHECK: %[[SLICE0:.+]] = tensor.extract_slice %[[ARG0]]
	// CHECK: %[[SLICE1:.+]] = tensor.extract_slice %[[ARG1]]			// CHECK: %[[SLICE1:.+]] = tensor.extract_slice %[[ARG1]]
	// CHECK: %[[SLICE2:.+]] = tensor.extract_slice %[[ARG2]]			// CHECK: %[[SLICE2:.+]] = tensor.extract_slice %[[ARG2]]
	// CHECK: %[[SLICERES:.+]] = linalg.conv_1d_nwc_wcf			// CHECK: %[[SLICERES:.+]] = linalg.conv_1d_nwc_wcf
	// CHECK: %[[RES:.+]] = tensor.insert_slice %[[SLICERES]] into %[[ARG2]]			// CHECK: %[[RES:.+]] = tensor.insert_slice %[[SLICERES]] into %[[ARG2]]
	%0 = linalg.conv_2d_nhwc_hwcf {dilations = dense<1> : tensor<2xi64>,			%0 = linalg.conv_2d_nhwc_hwcf {dilations = dense<1> : tensor<2xi64>,
	strides = dense<1> : tensor<2xi64>}			strides = dense<1> : tensor<2xi64>}
	ins (%input, %filter: tensor<?x1x?x?xf32>, tensor<1x?x?x?xf32>)			ins (%input, %filter: tensor<?x1x?x?xf32>, tensor<1x?x?x?xf32>)
	outs (%init: tensor<?x1x?x?xf32>) -> tensor<?x1x?x?xf32>			outs (%init: tensor<?x1x?x?xf32>)
	// CHECK: return %[[RES]]			// CHECK: return %[[RES]]
	return %0 : tensor<?x1x?x?xf32>			return %0 : tensor<?x1x?x?xf32>
	}			}

	// CHECK-LABEL: @conv_2d_nchw_fchw			// CHECK-LABEL: @conv_2d_nchw_fchw
	// CHECK-SAME: (%[[ARG0:[0-9a-z]+]]: tensor<?x?x1x?xf32>,			// CHECK-SAME: (%[[ARG0:[0-9a-z]+]]: tensor<?x?x1x?xf32>,
	// CHECK-SAME: %[[ARG1:[0-9a-z]+]]: tensor<?x?x1x?xf32>,			// CHECK-SAME: %[[ARG1:[0-9a-z]+]]: tensor<?x?x1x?xf32>,
	// CHECK-SAME: %[[ARG2:[0-9a-z]+]]: tensor<?x?x1x?xf32>)			// CHECK-SAME: %[[ARG2:[0-9a-z]+]]: tensor<?x?x1x?xf32>)
	func.func @conv_2d_nchw_fchw(%input: tensor<?x?x1x?xf32>, %filter: tensor<?x?x1x?xf32>, %init: tensor<?x?x1x?xf32>) -> tensor<?x?x1x?xf32> {			func.func @conv_2d_nchw_fchw(%input: tensor<?x?x1x?xf32>, %filter: tensor<?x?x1x?xf32>, %init: tensor<?x?x1x?xf32>) -> tensor<?x?x1x?xf32> {
	// CHECK: %[[SLICE0:.+]] = tensor.extract_slice %[[ARG0]]			// CHECK: %[[SLICE0:.+]] = tensor.extract_slice %[[ARG0]]
	// CHECK: %[[SLICE1:.+]] = tensor.extract_slice %[[ARG1]]			// CHECK: %[[SLICE1:.+]] = tensor.extract_slice %[[ARG1]]
	// CHECK: %[[SLICE2:.+]] = tensor.extract_slice %[[ARG2]]			// CHECK: %[[SLICE2:.+]] = tensor.extract_slice %[[ARG2]]
	// CHECK: %[[SLICERES:.+]] = linalg.conv_1d_ncw_fcw			// CHECK: %[[SLICERES:.+]] = linalg.conv_1d_ncw_fcw
	// CHECK: %[[RES:.+]] = tensor.insert_slice %[[SLICERES]] into %[[ARG2]]			// CHECK: %[[RES:.+]] = tensor.insert_slice %[[SLICERES]] into %[[ARG2]]
	%0 = linalg.conv_2d_nchw_fchw {dilations = dense<1> : tensor<2xi64>,			%0 = linalg.conv_2d_nchw_fchw {dilations = dense<1> : tensor<2xi64>,
	strides = dense<1> : tensor<2xi64>}			strides = dense<1> : tensor<2xi64>}
	ins (%input, %filter: tensor<?x?x1x?xf32>, tensor<?x?x1x?xf32>)			ins (%input, %filter: tensor<?x?x1x?xf32>, tensor<?x?x1x?xf32>)
	outs (%init: tensor<?x?x1x?xf32>) -> tensor<?x?x1x?xf32>			outs (%init: tensor<?x?x1x?xf32>)
	// CHECK: return %[[RES]]			// CHECK: return %[[RES]]
	return %0 : tensor<?x?x1x?xf32>			return %0 : tensor<?x?x1x?xf32>
	}			}

	// CHECK-LABEL: @depthwise_conv_2d_nhwc_hwc			// CHECK-LABEL: @depthwise_conv_2d_nhwc_hwc
	// CHECK-SAME: %[[ARG0:.+]]: tensor<1x1x113x96xf32>			// CHECK-SAME: %[[ARG0:.+]]: tensor<1x1x113x96xf32>
	// CHECK-SAME: %[[ARG1:.+]]: tensor<1x3x96xf32>			// CHECK-SAME: %[[ARG1:.+]]: tensor<1x3x96xf32>
	func.func @depthwise_conv_2d_nhwc_hwc(%input: tensor<1x1x113x96xf32>, %filter: tensor<1x3x96xf32>) -> tensor<1x1x56x96xf32> {			func.func @depthwise_conv_2d_nhwc_hwc(%input: tensor<1x1x113x96xf32>, %filter: tensor<1x3x96xf32>) -> tensor<1x1x56x96xf32> {
	// CHECK: %[[RES:.+]] = tensor.empty			// CHECK: %[[RES:.+]] = tensor.empty
	%init = tensor.empty() : tensor<1x1x56x96xf32>			%init = tensor.empty() : tensor<1x1x56x96xf32>
	// CHECK: %[[SLICE0:.+]] = tensor.extract_slice %[[ARG0]]			// CHECK: %[[SLICE0:.+]] = tensor.extract_slice %[[ARG0]]
	// CHECK: %[[SLICE1:.+]] = tensor.extract_slice %[[ARG1]]			// CHECK: %[[SLICE1:.+]] = tensor.extract_slice %[[ARG1]]
	// CHECK: %[[SLICERES:.+]] = tensor.extract_slice %[[RES]]			// CHECK: %[[SLICERES:.+]] = tensor.extract_slice %[[RES]]
	// CHECK: %[[OPRES:.+]] = linalg.depthwise_conv_1d_nwc_wc			// CHECK: %[[OPRES:.+]] = linalg.depthwise_conv_1d_nwc_wc
	// CHECK-SAME: ins(%[[SLICE0]], %[[SLICE1]]			// CHECK-SAME: ins(%[[SLICE0]], %[[SLICE1]]
	// CHECK-SAME: outs(%[[SLICERES]]			// CHECK-SAME: outs(%[[SLICERES]]
	// CHECK: %[[INSERTED:.+]] = tensor.insert_slice %[[OPRES]] into %[[RES]]			// CHECK: %[[INSERTED:.+]] = tensor.insert_slice %[[OPRES]] into %[[RES]]
	%0 = linalg.depthwise_conv_2d_nhwc_hwc {dilations = dense<1> : vector<2xi64>, strides = dense<2> : vector<2xi64>}			%0 = linalg.depthwise_conv_2d_nhwc_hwc {dilations = dense<1> : vector<2xi64>, strides = dense<2> : vector<2xi64>}
	ins(%input, %filter: tensor<1x1x113x96xf32>, tensor<1x3x96xf32>)			ins(%input, %filter: tensor<1x1x113x96xf32>, tensor<1x3x96xf32>)
	outs(%init: tensor<1x1x56x96xf32>) -> tensor<1x1x56x96xf32>			outs(%init: tensor<1x1x56x96xf32>)
	// CHECK: %[[INSERTED]]			// CHECK: %[[INSERTED]]
	return %0: tensor<1x1x56x96xf32>			return %0: tensor<1x1x56x96xf32>
	}			}

	// CHECK-LABEL: @pooling_nhwc_sum			// CHECK-LABEL: @pooling_nhwc_sum
	// CHECK-SAME: %[[ARG0:.+]]: tensor<?x1x?x?xf32>,			// CHECK-SAME: %[[ARG0:.+]]: tensor<?x1x?x?xf32>,
	// CHECK-SAME: %[[ARG1:.+]]: tensor<1x?xf32>			// CHECK-SAME: %[[ARG1:.+]]: tensor<1x?xf32>
	// CHECK-SAME: %[[ARG2:.+]]: tensor<?x1x?x?xf32>			// CHECK-SAME: %[[ARG2:.+]]: tensor<?x1x?x?xf32>
	func.func @pooling_nhwc_sum(%input: tensor<?x1x?x?xf32>, %filter: tensor<1x?xf32>, %init: tensor<?x1x?x?xf32>) -> tensor<?x1x?x?xf32> {			func.func @pooling_nhwc_sum(%input: tensor<?x1x?x?xf32>, %filter: tensor<1x?xf32>, %init: tensor<?x1x?x?xf32>) -> tensor<?x1x?x?xf32> {
	// CHECK: %[[SLICE0:.+]] = tensor.extract_slice %[[ARG0]]			// CHECK: %[[SLICE0:.+]] = tensor.extract_slice %[[ARG0]]
	// CHECK: %[[SLICE1:.+]] = tensor.extract_slice %[[ARG1]]			// CHECK: %[[SLICE1:.+]] = tensor.extract_slice %[[ARG1]]
	// CHECK: %[[SLICE2:.+]] = tensor.extract_slice %[[ARG2]]			// CHECK: %[[SLICE2:.+]] = tensor.extract_slice %[[ARG2]]
	// CHECK: %[[SLICERES:.+]] = linalg.pooling_nwc_sum			// CHECK: %[[SLICERES:.+]] = linalg.pooling_nwc_sum
	// CHECK: %[[RES:.+]] = tensor.insert_slice %[[SLICERES]] into %[[ARG2]]			// CHECK: %[[RES:.+]] = tensor.insert_slice %[[SLICERES]] into %[[ARG2]]
	%0 = linalg.pooling_nhwc_sum {dilations = dense<1> : tensor<2xi64>,			%0 = linalg.pooling_nhwc_sum {dilations = dense<1> : tensor<2xi64>,
	strides = dense<1> : tensor<2xi64>}			strides = dense<1> : tensor<2xi64>}
	ins (%input, %filter: tensor<?x1x?x?xf32>, tensor<1x?xf32>)			ins (%input, %filter: tensor<?x1x?x?xf32>, tensor<1x?xf32>)
	outs (%init: tensor<?x1x?x?xf32>) -> tensor<?x1x?x?xf32>			outs (%init: tensor<?x1x?x?xf32>)
	// CHECK: return %[[RES]]			// CHECK: return %[[RES]]
	return %0 : tensor<?x1x?x?xf32>			return %0 : tensor<?x1x?x?xf32>
	}			}

	// CHECK-LABEL: @pooling_nchw_sum			// CHECK-LABEL: @pooling_nchw_sum
	// CHECK-SAME: (%[[ARG0:[0-9a-z]+]]: tensor<?x?x1x?xf32>,			// CHECK-SAME: (%[[ARG0:[0-9a-z]+]]: tensor<?x?x1x?xf32>,
	// CHECK-SAME: %[[ARG1:[0-9a-z]+]]: tensor<1x?xf32>,			// CHECK-SAME: %[[ARG1:[0-9a-z]+]]: tensor<1x?xf32>,
	// CHECK-SAME: %[[ARG2:[0-9a-z]+]]: tensor<?x?x1x?xf32>)			// CHECK-SAME: %[[ARG2:[0-9a-z]+]]: tensor<?x?x1x?xf32>)
	func.func @pooling_nchw_sum(%input: tensor<?x?x1x?xf32>, %filter: tensor<1x?xf32>, %init: tensor<?x?x1x?xf32>) -> tensor<?x?x1x?xf32> {			func.func @pooling_nchw_sum(%input: tensor<?x?x1x?xf32>, %filter: tensor<1x?xf32>, %init: tensor<?x?x1x?xf32>) -> tensor<?x?x1x?xf32> {
	// CHECK: %[[SLICE0:.+]] = tensor.extract_slice %[[ARG0]]			// CHECK: %[[SLICE0:.+]] = tensor.extract_slice %[[ARG0]]
	// CHECK: %[[SLICE1:.+]] = tensor.extract_slice %[[ARG1]]			// CHECK: %[[SLICE1:.+]] = tensor.extract_slice %[[ARG1]]
	// CHECK: %[[SLICE2:.+]] = tensor.extract_slice %[[ARG2]]			// CHECK: %[[SLICE2:.+]] = tensor.extract_slice %[[ARG2]]
	// CHECK: %[[SLICERES:.+]] = linalg.pooling_ncw_sum			// CHECK: %[[SLICERES:.+]] = linalg.pooling_ncw_sum
	// CHECK: %[[RES:.+]] = tensor.insert_slice %[[SLICERES]] into %[[ARG2]]			// CHECK: %[[RES:.+]] = tensor.insert_slice %[[SLICERES]] into %[[ARG2]]
	%0 = linalg.pooling_nchw_sum {dilations = dense<1> : tensor<2xi64>,			%0 = linalg.pooling_nchw_sum {dilations = dense<1> : tensor<2xi64>,
	strides = dense<1> : tensor<2xi64>}			strides = dense<1> : tensor<2xi64>}
	ins (%input, %filter: tensor<?x?x1x?xf32>, tensor<1x?xf32>)			ins (%input, %filter: tensor<?x?x1x?xf32>, tensor<1x?xf32>)
	outs (%init: tensor<?x?x1x?xf32>) -> tensor<?x?x1x?xf32>			outs (%init: tensor<?x?x1x?xf32>)
	// CHECK: return %[[RES]]			// CHECK: return %[[RES]]
	return %0 : tensor<?x?x1x?xf32>			return %0 : tensor<?x?x1x?xf32>
	}			}

	// CHECK-LABEL: @pooling_nhwc_max			// CHECK-LABEL: @pooling_nhwc_max
	// CHECK-SAME: %[[ARG0:.+]]: tensor<?x1x?x?xf32>,			// CHECK-SAME: %[[ARG0:.+]]: tensor<?x1x?x?xf32>,
	// CHECK-SAME: %[[ARG1:.+]]: tensor<1x?xf32>			// CHECK-SAME: %[[ARG1:.+]]: tensor<1x?xf32>
	// CHECK-SAME: %[[ARG2:.+]]: tensor<?x1x?x?xf32>			// CHECK-SAME: %[[ARG2:.+]]: tensor<?x1x?x?xf32>
	func.func @pooling_nhwc_max(%input: tensor<?x1x?x?xf32>, %filter: tensor<1x?xf32>, %init: tensor<?x1x?x?xf32>) -> tensor<?x1x?x?xf32> {			func.func @pooling_nhwc_max(%input: tensor<?x1x?x?xf32>, %filter: tensor<1x?xf32>, %init: tensor<?x1x?x?xf32>) -> tensor<?x1x?x?xf32> {
	// CHECK: %[[SLICE0:.+]] = tensor.extract_slice %[[ARG0]]			// CHECK: %[[SLICE0:.+]] = tensor.extract_slice %[[ARG0]]
	// CHECK: %[[SLICE1:.+]] = tensor.extract_slice %[[ARG1]]			// CHECK: %[[SLICE1:.+]] = tensor.extract_slice %[[ARG1]]
	// CHECK: %[[SLICE2:.+]] = tensor.extract_slice %[[ARG2]]			// CHECK: %[[SLICE2:.+]] = tensor.extract_slice %[[ARG2]]
	// CHECK: %[[SLICERES:.+]] = linalg.pooling_nwc_max			// CHECK: %[[SLICERES:.+]] = linalg.pooling_nwc_max
	// CHECK: %[[RES:.+]] = tensor.insert_slice %[[SLICERES]] into %[[ARG2]]			// CHECK: %[[RES:.+]] = tensor.insert_slice %[[SLICERES]] into %[[ARG2]]
	%0 = linalg.pooling_nhwc_max {dilations = dense<1> : tensor<2xi64>,			%0 = linalg.pooling_nhwc_max {dilations = dense<1> : tensor<2xi64>,
	strides = dense<1> : tensor<2xi64>}			strides = dense<1> : tensor<2xi64>}
	ins (%input, %filter: tensor<?x1x?x?xf32>, tensor<1x?xf32>)			ins (%input, %filter: tensor<?x1x?x?xf32>, tensor<1x?xf32>)
	outs (%init: tensor<?x1x?x?xf32>) -> tensor<?x1x?x?xf32>			outs (%init: tensor<?x1x?x?xf32>)
	// CHECK: return %[[RES]]			// CHECK: return %[[RES]]
	return %0 : tensor<?x1x?x?xf32>			return %0 : tensor<?x1x?x?xf32>
	}			}

	// CHECK-LABEL: @pooling_nhwc_max_unsigned			// CHECK-LABEL: @pooling_nhwc_max_unsigned
	// CHECK-SAME: %[[ARG0:.+]]: tensor<?x1x?x?xf32>,			// CHECK-SAME: %[[ARG0:.+]]: tensor<?x1x?x?xf32>,
	// CHECK-SAME: %[[ARG1:.+]]: tensor<1x?xf32>			// CHECK-SAME: %[[ARG1:.+]]: tensor<1x?xf32>
	// CHECK-SAME: %[[ARG2:.+]]: tensor<?x1x?x?xf32>			// CHECK-SAME: %[[ARG2:.+]]: tensor<?x1x?x?xf32>
	func.func @pooling_nhwc_max_unsigned(%input: tensor<?x1x?x?xf32>, %filter: tensor<1x?xf32>, %init: tensor<?x1x?x?xf32>) -> tensor<?x1x?x?xf32> {			func.func @pooling_nhwc_max_unsigned(%input: tensor<?x1x?x?xf32>, %filter: tensor<1x?xf32>, %init: tensor<?x1x?x?xf32>) -> tensor<?x1x?x?xf32> {
	// CHECK: %[[SLICE0:.+]] = tensor.extract_slice %[[ARG0]]			// CHECK: %[[SLICE0:.+]] = tensor.extract_slice %[[ARG0]]
	// CHECK: %[[SLICE1:.+]] = tensor.extract_slice %[[ARG1]]			// CHECK: %[[SLICE1:.+]] = tensor.extract_slice %[[ARG1]]
	// CHECK: %[[SLICE2:.+]] = tensor.extract_slice %[[ARG2]]			// CHECK: %[[SLICE2:.+]] = tensor.extract_slice %[[ARG2]]
	// CHECK: %[[SLICERES:.+]] = linalg.pooling_nwc_max_unsigned			// CHECK: %[[SLICERES:.+]] = linalg.pooling_nwc_max_unsigned
	// CHECK: %[[RES:.+]] = tensor.insert_slice %[[SLICERES]] into %[[ARG2]]			// CHECK: %[[RES:.+]] = tensor.insert_slice %[[SLICERES]] into %[[ARG2]]
	%0 = linalg.pooling_nhwc_max_unsigned {dilations = dense<1> : tensor<2xi64>,			%0 = linalg.pooling_nhwc_max_unsigned {dilations = dense<1> : tensor<2xi64>,
	strides = dense<1> : tensor<2xi64>}			strides = dense<1> : tensor<2xi64>}
	ins (%input, %filter: tensor<?x1x?x?xf32>, tensor<1x?xf32>)			ins (%input, %filter: tensor<?x1x?x?xf32>, tensor<1x?xf32>)
	outs (%init: tensor<?x1x?x?xf32>) -> tensor<?x1x?x?xf32>			outs (%init: tensor<?x1x?x?xf32>)
	// CHECK: return %[[RES]]			// CHECK: return %[[RES]]
	return %0 : tensor<?x1x?x?xf32>			return %0 : tensor<?x1x?x?xf32>
	}			}

	// CHECK-LABEL: @pooling_nhwc_min			// CHECK-LABEL: @pooling_nhwc_min
	// CHECK-SAME: %[[ARG0:.+]]: tensor<?x1x?x?xf32>,			// CHECK-SAME: %[[ARG0:.+]]: tensor<?x1x?x?xf32>,
	// CHECK-SAME: %[[ARG1:.+]]: tensor<1x?xf32>			// CHECK-SAME: %[[ARG1:.+]]: tensor<1x?xf32>
	// CHECK-SAME: %[[ARG2:.+]]: tensor<?x1x?x?xf32>			// CHECK-SAME: %[[ARG2:.+]]: tensor<?x1x?x?xf32>
	func.func @pooling_nhwc_min(%input: tensor<?x1x?x?xf32>, %filter: tensor<1x?xf32>, %init: tensor<?x1x?x?xf32>) -> tensor<?x1x?x?xf32> {			func.func @pooling_nhwc_min(%input: tensor<?x1x?x?xf32>, %filter: tensor<1x?xf32>, %init: tensor<?x1x?x?xf32>) -> tensor<?x1x?x?xf32> {
	// CHECK: %[[SLICE0:.+]] = tensor.extract_slice %[[ARG0]]			// CHECK: %[[SLICE0:.+]] = tensor.extract_slice %[[ARG0]]
	// CHECK: %[[SLICE1:.+]] = tensor.extract_slice %[[ARG1]]			// CHECK: %[[SLICE1:.+]] = tensor.extract_slice %[[ARG1]]
	// CHECK: %[[SLICE2:.+]] = tensor.extract_slice %[[ARG2]]			// CHECK: %[[SLICE2:.+]] = tensor.extract_slice %[[ARG2]]
	// CHECK: %[[SLICERES:.+]] = linalg.pooling_nwc_min			// CHECK: %[[SLICERES:.+]] = linalg.pooling_nwc_min
	// CHECK: %[[RES:.+]] = tensor.insert_slice %[[SLICERES]] into %[[ARG2]]			// CHECK: %[[RES:.+]] = tensor.insert_slice %[[SLICERES]] into %[[ARG2]]
	%0 = linalg.pooling_nhwc_min {dilations = dense<1> : tensor<2xi64>,			%0 = linalg.pooling_nhwc_min {dilations = dense<1> : tensor<2xi64>,
	strides = dense<1> : tensor<2xi64>}			strides = dense<1> : tensor<2xi64>}
	ins (%input, %filter: tensor<?x1x?x?xf32>, tensor<1x?xf32>)			ins (%input, %filter: tensor<?x1x?x?xf32>, tensor<1x?xf32>)
	outs (%init: tensor<?x1x?x?xf32>) -> tensor<?x1x?x?xf32>			outs (%init: tensor<?x1x?x?xf32>)
	// CHECK: return %[[RES]]			// CHECK: return %[[RES]]
	return %0 : tensor<?x1x?x?xf32>			return %0 : tensor<?x1x?x?xf32>
	}			}

	// CHECK-LABEL: @pooling_nhwc_min_unsigned			// CHECK-LABEL: @pooling_nhwc_min_unsigned
	// CHECK-SAME: %[[ARG0:.+]]: tensor<?x1x?x?xf32>,			// CHECK-SAME: %[[ARG0:.+]]: tensor<?x1x?x?xf32>,
	// CHECK-SAME: %[[ARG1:.+]]: tensor<1x?xf32>			// CHECK-SAME: %[[ARG1:.+]]: tensor<1x?xf32>
	// CHECK-SAME: %[[ARG2:.+]]: tensor<?x1x?x?xf32>			// CHECK-SAME: %[[ARG2:.+]]: tensor<?x1x?x?xf32>
	func.func @pooling_nhwc_min_unsigned(%input: tensor<?x1x?x?xf32>, %filter: tensor<1x?xf32>, %init: tensor<?x1x?x?xf32>) -> tensor<?x1x?x?xf32> {			func.func @pooling_nhwc_min_unsigned(%input: tensor<?x1x?x?xf32>, %filter: tensor<1x?xf32>, %init: tensor<?x1x?x?xf32>) -> tensor<?x1x?x?xf32> {
	// CHECK: %[[SLICE0:.+]] = tensor.extract_slice %[[ARG0]]			// CHECK: %[[SLICE0:.+]] = tensor.extract_slice %[[ARG0]]
	// CHECK: %[[SLICE1:.+]] = tensor.extract_slice %[[ARG1]]			// CHECK: %[[SLICE1:.+]] = tensor.extract_slice %[[ARG1]]
	// CHECK: %[[SLICE2:.+]] = tensor.extract_slice %[[ARG2]]			// CHECK: %[[SLICE2:.+]] = tensor.extract_slice %[[ARG2]]
	// CHECK: %[[SLICERES:.+]] = linalg.pooling_nwc_min_unsigned			// CHECK: %[[SLICERES:.+]] = linalg.pooling_nwc_min_unsigned
	// CHECK: %[[RES:.+]] = tensor.insert_slice %[[SLICERES]] into %[[ARG2]]			// CHECK: %[[RES:.+]] = tensor.insert_slice %[[SLICERES]] into %[[ARG2]]
	%0 = linalg.pooling_nhwc_min_unsigned {dilations = dense<1> : tensor<2xi64>,			%0 = linalg.pooling_nhwc_min_unsigned {dilations = dense<1> : tensor<2xi64>,
	strides = dense<1> : tensor<2xi64>}			strides = dense<1> : tensor<2xi64>}
	ins (%input, %filter: tensor<?x1x?x?xf32>, tensor<1x?xf32>)			ins (%input, %filter: tensor<?x1x?x?xf32>, tensor<1x?xf32>)
	outs (%init: tensor<?x1x?x?xf32>) -> tensor<?x1x?x?xf32>			outs (%init: tensor<?x1x?x?xf32>)
	// CHECK: return %[[RES]]			// CHECK: return %[[RES]]
	return %0 : tensor<?x1x?x?xf32>			return %0 : tensor<?x1x?x?xf32>
	}			}

	// CHECK-LABEL: @pooling_nchw_max			// CHECK-LABEL: @pooling_nchw_max
	// CHECK-SAME: (%[[ARG0:[0-9a-z]+]]: tensor<?x?x1x?xf32>,			// CHECK-SAME: (%[[ARG0:[0-9a-z]+]]: tensor<?x?x1x?xf32>,
	// CHECK-SAME: %[[ARG1:[0-9a-z]+]]: tensor<1x?xf32>,			// CHECK-SAME: %[[ARG1:[0-9a-z]+]]: tensor<1x?xf32>,
	// CHECK-SAME: %[[ARG2:[0-9a-z]+]]: tensor<?x?x1x?xf32>)			// CHECK-SAME: %[[ARG2:[0-9a-z]+]]: tensor<?x?x1x?xf32>)
	func.func @pooling_nchw_max(%input: tensor<?x?x1x?xf32>, %filter: tensor<1x?xf32>, %init: tensor<?x?x1x?xf32>) -> tensor<?x?x1x?xf32> {			func.func @pooling_nchw_max(%input: tensor<?x?x1x?xf32>, %filter: tensor<1x?xf32>, %init: tensor<?x?x1x?xf32>) -> tensor<?x?x1x?xf32> {
	// CHECK: %[[SLICE0:.+]] = tensor.extract_slice %[[ARG0]]			// CHECK: %[[SLICE0:.+]] = tensor.extract_slice %[[ARG0]]
	// CHECK: %[[SLICE1:.+]] = tensor.extract_slice %[[ARG1]]			// CHECK: %[[SLICE1:.+]] = tensor.extract_slice %[[ARG1]]
	// CHECK: %[[SLICE2:.+]] = tensor.extract_slice %[[ARG2]]			// CHECK: %[[SLICE2:.+]] = tensor.extract_slice %[[ARG2]]
	// CHECK: %[[SLICERES:.+]] = linalg.pooling_ncw_max			// CHECK: %[[SLICERES:.+]] = linalg.pooling_ncw_max
	// CHECK: %[[RES:.+]] = tensor.insert_slice %[[SLICERES]] into %[[ARG2]]			// CHECK: %[[RES:.+]] = tensor.insert_slice %[[SLICERES]] into %[[ARG2]]
	%0 = linalg.pooling_nchw_max {dilations = dense<1> : tensor<2xi64>,			%0 = linalg.pooling_nchw_max {dilations = dense<1> : tensor<2xi64>,
	strides = dense<1> : tensor<2xi64>}			strides = dense<1> : tensor<2xi64>}
	ins (%input, %filter: tensor<?x?x1x?xf32>, tensor<1x?xf32>)			ins (%input, %filter: tensor<?x?x1x?xf32>, tensor<1x?xf32>)
	outs (%init: tensor<?x?x1x?xf32>) -> tensor<?x?x1x?xf32>			outs (%init: tensor<?x?x1x?xf32>)
	// CHECK: return %[[RES]]			// CHECK: return %[[RES]]
	return %0 : tensor<?x?x1x?xf32>			return %0 : tensor<?x?x1x?xf32>
	}			}

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb1(%arg1: !pdl.operation):			^bb1(%arg1: !pdl.operation):
	%0 = transform.structured.match interface{LinalgOp} in %arg1			%0 = transform.structured.match interface{LinalgOp} in %arg1
	%1 = transform.structured.decompose %0			%1 = transform.structured.decompose %0
	}			}

mlir/test/Dialect/Linalg/transform-op-fuse-into-containing.mlir

// RUN: mlir-opt --test-transform-dialect-interpreter --split-input-file %s \| FileCheck %s		// RUN: mlir-opt --test-transform-dialect-interpreter --split-input-file %s \| FileCheck %s

#map0 = affine_map<()[s0, s1] -> (s0 ceildiv s1)>		#map0 = affine_map<()[s0, s1] -> (s0 ceildiv s1)>
#map1 = affine_map<(d0)[s0] -> (d0 * s0)>		#map1 = affine_map<(d0)[s0] -> (d0 * s0)>
#map2 = affine_map<(d0)[s0, s1] -> (-(d0 * s1) + s0, s1)>		#map2 = affine_map<(d0)[s0, s1] -> (-(d0 * s1) + s0, s1)>

module {		module {
// CHECK-LABEL: func.func @fuse_tileable_op		// CHECK-LABEL: func.func @fuse_tileable_op
// CHECK-SAME: %[[CHUNK_SIZE:[0-9a-z]+]]: index		// CHECK-SAME: %[[CHUNK_SIZE:[0-9a-z]+]]: index
// CHECK-SAME: %[[IN:[0-9a-z]+]]: tensor<?xf32>		// CHECK-SAME: %[[IN:[0-9a-z]+]]: tensor<?xf32>
// CHECK-SAME: %[[OUT:[0-9a-z]+]]: tensor<?xf32>		// CHECK-SAME: %[[OUT:[0-9a-z]+]]: tensor<?xf32>
func.func @fuse_tileable_op(%arg0: index, %arg1: tensor<?xf32>, %arg2: tensor<?xf32>) -> tensor<?xf32> {		func.func @fuse_tileable_op(%arg0: index, %arg1: tensor<?xf32>, %arg2: tensor<?xf32>) -> tensor<?xf32> {
%cst = arith.constant 4.200000e+01 : f32		%cst = arith.constant 4.200000e+01 : f32
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%0 = linalg.fill ins(%cst : f32) outs(%arg1 : tensor<?xf32>) -> tensor<?xf32>		%0 = linalg.fill ins(%cst : f32) outs(%arg1 : tensor<?xf32>)
%d0 = tensor.dim %arg1, %c0 : tensor<?xf32>		%d0 = tensor.dim %arg1, %c0 : tensor<?xf32>
%1 = affine.apply #map0()[%d0, %arg0]		%1 = affine.apply #map0()[%d0, %arg0]

// CHECK: scf.foreach_thread {{.*}} {		// CHECK: scf.foreach_thread {{.*}} {
%2 = scf.foreach_thread (%arg3) in (%1) shared_outs(%o = %arg2) -> (tensor<?xf32>) {		%2 = scf.foreach_thread (%arg3) in (%1) shared_outs(%o = %arg2) -> (tensor<?xf32>) {
%3 = affine.apply #map1(%arg3)[%arg0]		%3 = affine.apply #map1(%arg3)[%arg0]
%4 = affine.min #map2(%arg3)[%d0, %arg0]		%4 = affine.min #map2(%arg3)[%d0, %arg0]
%5 = tensor.extract_slice %o[%3] [%4] [1] : tensor<?xf32> to tensor<?xf32>		%5 = tensor.extract_slice %o[%3] [%4] [1] : tensor<?xf32> to tensor<?xf32>

// CHECK: %[[T0:.]] = tensor.extract_slice %[[IN]][%{{.}}] [%{{.}}] [{{.}}]		// CHECK: %[[T0:.]] = tensor.extract_slice %[[IN]][%{{.}}] [%{{.}}] [{{.}}]
// CHECK: %[[T1:.]] = linalg.fill {{.}} outs(%[[T0]]		// CHECK: %[[T1:.]] = linalg.fill {{.}} outs(%[[T0]]
%6 = tensor.extract_slice %0[%3] [%4] [1] : tensor<?xf32> to tensor<?xf32>		%6 = tensor.extract_slice %0[%3] [%4] [1] : tensor<?xf32> to tensor<?xf32>

// CHECK: %[[T2:.*]] = linalg.elemwise_unary ins(%[[T1]]		// CHECK: %[[T2:.*]] = linalg.elemwise_unary ins(%[[T1]]
%7 = linalg.elemwise_unary ins(%6 : tensor<?xf32>) outs(%5 : tensor<?xf32>) -> tensor<?xf32>		%7 = linalg.elemwise_unary ins(%6 : tensor<?xf32>) outs(%5 : tensor<?xf32>)
scf.foreach_thread.perform_concurrently {		scf.foreach_thread.perform_concurrently {
tensor.parallel_insert_slice %7 into %o[%3] [%4] [1] : tensor<?xf32> into tensor<?xf32>		tensor.parallel_insert_slice %7 into %o[%3] [%4] [1] : tensor<?xf32> into tensor<?xf32>
}		}
}		}
// CHECK: }		// CHECK: }
func.return %2 : tensor<?xf32>		func.return %2 : tensor<?xf32>
}		}

Show All 30 Lines	func.func @fuse_untileable_op(%arg0: index, %arg1: tensor<64xf32>, %arg2: tensor<64xf32>) -> tensor<64xf32> {
// CHECK: scf.foreach_thread {{.*}} {		// CHECK: scf.foreach_thread {{.*}} {
%2 = scf.foreach_thread (%arg3) in (%1) shared_outs(%o = %arg2) -> (tensor<64xf32>) {		%2 = scf.foreach_thread (%arg3) in (%1) shared_outs(%o = %arg2) -> (tensor<64xf32>) {
// CHECK: %[[INIT_TENSOR:.*]] = tensor.empty		// CHECK: %[[INIT_TENSOR:.*]] = tensor.empty
%3 = affine.apply #map1(%arg3)[%arg0]		%3 = affine.apply #map1(%arg3)[%arg0]
%4 = affine.min #map2(%arg3)[%arg0]		%4 = affine.min #map2(%arg3)[%arg0]
%5 = tensor.extract_slice %o[%3] [%4] [1] : tensor<64xf32> to tensor<?xf32>		%5 = tensor.extract_slice %o[%3] [%4] [1] : tensor<64xf32> to tensor<?xf32>

// CHECK: %[[T2:.*]] = linalg.elemwise_unary ins(%[[INIT_TENSOR]]		// CHECK: %[[T2:.*]] = linalg.elemwise_unary ins(%[[INIT_TENSOR]]
%7 = linalg.elemwise_unary ins(%0 : tensor<?xf32>) outs(%5 : tensor<?xf32>) -> tensor<?xf32>		%7 = linalg.elemwise_unary ins(%0 : tensor<?xf32>) outs(%5 : tensor<?xf32>)
scf.foreach_thread.perform_concurrently {		scf.foreach_thread.perform_concurrently {
tensor.parallel_insert_slice %7 into %o[%3] [%4] [1] : tensor<?xf32> into tensor<64xf32>		tensor.parallel_insert_slice %7 into %o[%3] [%4] [1] : tensor<?xf32> into tensor<64xf32>
}		}
}		}
// CHECK: }		// CHECK: }

func.return %2 : tensor<64xf32>		func.return %2 : tensor<64xf32>
}		}
Show All 17 Lines	module {

// CHECK-LABEL: func.func @fuse_tileable_op_rank_reducing		// CHECK-LABEL: func.func @fuse_tileable_op_rank_reducing
// CHECK-SAME: %[[CHUNK_SIZE:[0-9a-z]+]]: index		// CHECK-SAME: %[[CHUNK_SIZE:[0-9a-z]+]]: index
// CHECK-SAME: %[[IN:[0-9a-z]+]]: tensor<?xf32>		// CHECK-SAME: %[[IN:[0-9a-z]+]]: tensor<?xf32>
// CHECK-SAME: %[[OUT:[0-9a-z]+]]: tensor<?xf32>		// CHECK-SAME: %[[OUT:[0-9a-z]+]]: tensor<?xf32>
func.func @fuse_tileable_op_rank_reducing(%arg0: index, %arg1: tensor<?xf32>, %arg2: tensor<?xf32>) -> tensor<?xf32> {		func.func @fuse_tileable_op_rank_reducing(%arg0: index, %arg1: tensor<?xf32>, %arg2: tensor<?xf32>) -> tensor<?xf32> {
%cst = arith.constant 4.200000e+01 : f32		%cst = arith.constant 4.200000e+01 : f32
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%0 = linalg.fill ins(%cst : f32) outs(%arg2 : tensor<?xf32>) -> tensor<?xf32>		%0 = linalg.fill ins(%cst : f32) outs(%arg2 : tensor<?xf32>)
%d0 = tensor.dim %arg1, %c0 : tensor<?xf32>		%d0 = tensor.dim %arg1, %c0 : tensor<?xf32>

// CHECK: scf.foreach_thread {{.*}} -> (tensor<?xf32>) {		// CHECK: scf.foreach_thread {{.*}} -> (tensor<?xf32>) {
%2 = scf.foreach_thread (%arg3) in (%d0) shared_outs(%o = %0) -> (tensor<?xf32>) {		%2 = scf.foreach_thread (%arg3) in (%d0) shared_outs(%o = %0) -> (tensor<?xf32>) {
%5 = tensor.extract_slice %o[%arg3] [1] [1] : tensor<?xf32> to tensor<f32>		%5 = tensor.extract_slice %o[%arg3] [1] [1] : tensor<?xf32> to tensor<f32>

// CHECK: tensor.extract_slice %{{.}}[%{{.}}] [1] [1] : tensor<?xf32> to tensor<1xf32>		// CHECK: tensor.extract_slice %{{.}}[%{{.}}] [1] [1] : tensor<?xf32> to tensor<1xf32>
// CHECK: linalg.fill ins(%{{.}} : f32) outs(%{{.}} : tensor<1xf32>) -> tensor<1xf32>		// CHECK: linalg.fill ins(%{{.}} : f32) outs(%{{.}} : tensor<1xf32>)
// CHECK: tensor.extract_slice %{{.*}}[0] [1] [1] : tensor<1xf32> to tensor<f32>		// CHECK: tensor.extract_slice %{{.*}}[0] [1] [1] : tensor<1xf32> to tensor<f32>
// CHECK: func.call @foo(%{{.*}}) : (tensor<f32>) -> tensor<f32>		// CHECK: func.call @foo(%{{.*}}) : (tensor<f32>) -> tensor<f32>
%7 = func.call @foo(%5) : (tensor<f32>) -> tensor<f32>		%7 = func.call @foo(%5) : (tensor<f32>) -> tensor<f32>

scf.foreach_thread.perform_concurrently {		scf.foreach_thread.perform_concurrently {
// CHECK: tensor.parallel_insert_slice %{{.}} into %{{.}}[%{{.*}}] [1] [1] : tensor<f32> into tensor<?xf32>		// CHECK: tensor.parallel_insert_slice %{{.}} into %{{.}}[%{{.*}}] [1] [1] : tensor<f32> into tensor<?xf32>
tensor.parallel_insert_slice %7 into %o[%arg3] [1] [1] : tensor<f32> into tensor<?xf32>		tensor.parallel_insert_slice %7 into %o[%arg3] [1] [1] : tensor<f32> into tensor<?xf32>
}		}
Show All 21 Lines
module {		module {
// CHECK-LABEL: func.func @fuse_tileable_op_through_bbarg		// CHECK-LABEL: func.func @fuse_tileable_op_through_bbarg
// CHECK-SAME: %[[CHUNK_SIZE:[0-9a-z]+]]: index		// CHECK-SAME: %[[CHUNK_SIZE:[0-9a-z]+]]: index
// CHECK-SAME: %[[IN:[0-9a-z]+]]: tensor<?xf32>		// CHECK-SAME: %[[IN:[0-9a-z]+]]: tensor<?xf32>
// CHECK-SAME: %[[OUT:[0-9a-z]+]]: tensor<?xf32>		// CHECK-SAME: %[[OUT:[0-9a-z]+]]: tensor<?xf32>
func.func @fuse_tileable_op_through_bbarg(%arg0: index, %arg1: tensor<?xf32>, %arg2: tensor<?xf32>) -> tensor<?xf32> {		func.func @fuse_tileable_op_through_bbarg(%arg0: index, %arg1: tensor<?xf32>, %arg2: tensor<?xf32>) -> tensor<?xf32> {
%cst = arith.constant 4.200000e+01 : f32		%cst = arith.constant 4.200000e+01 : f32
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%0 = linalg.fill ins(%cst : f32) outs(%arg2 : tensor<?xf32>) -> tensor<?xf32>		%0 = linalg.fill ins(%cst : f32) outs(%arg2 : tensor<?xf32>)
%d0 = tensor.dim %arg1, %c0 : tensor<?xf32>		%d0 = tensor.dim %arg1, %c0 : tensor<?xf32>
%1 = affine.apply #map0()[%d0, %arg0]		%1 = affine.apply #map0()[%d0, %arg0]

// CHECK: scf.foreach_thread {{.}} shared_outs(%[[BBARGOUT:.]] = %[[OUT]]) -> (tensor<?xf32>) {		// CHECK: scf.foreach_thread {{.}} shared_outs(%[[BBARGOUT:.]] = %[[OUT]]) -> (tensor<?xf32>) {
%2 = scf.foreach_thread (%arg3) in (%1) shared_outs(%o = %0) -> (tensor<?xf32>) {		%2 = scf.foreach_thread (%arg3) in (%1) shared_outs(%o = %0) -> (tensor<?xf32>) {
%3 = affine.apply #map1(%arg3)[%arg0]		%3 = affine.apply #map1(%arg3)[%arg0]
%4 = affine.min #map2(%arg3)[%d0, %arg0]		%4 = affine.min #map2(%arg3)[%d0, %arg0]
%5 = tensor.extract_slice %o[%3] [%4] [1] : tensor<?xf32> to tensor<?xf32>		%5 = tensor.extract_slice %o[%3] [%4] [1] : tensor<?xf32> to tensor<?xf32>

// CHECK: %[[T0:.]] = tensor.extract_slice %[[BBARGOUT]][%{{.}}] [%{{.}}] [{{.}}]		// CHECK: %[[T0:.]] = tensor.extract_slice %[[BBARGOUT]][%{{.}}] [%{{.}}] [{{.}}]
// CHECK: %[[T1:.]] = linalg.fill {{.}} outs(%[[T0]]		// CHECK: %[[T1:.]] = linalg.fill {{.}} outs(%[[T0]]
%6 = tensor.extract_slice %arg1[%3] [%4] [1] : tensor<?xf32> to tensor<?xf32>		%6 = tensor.extract_slice %arg1[%3] [%4] [1] : tensor<?xf32> to tensor<?xf32>

// CHECK: %[[T2:.]] = linalg.elemwise_unary {{.}} outs(%[[T1]]		// CHECK: %[[T2:.]] = linalg.elemwise_unary {{.}} outs(%[[T1]]
%7 = linalg.elemwise_unary ins(%6 : tensor<?xf32>) outs(%5 : tensor<?xf32>) -> tensor<?xf32>		%7 = linalg.elemwise_unary ins(%6 : tensor<?xf32>) outs(%5 : tensor<?xf32>)
scf.foreach_thread.perform_concurrently {		scf.foreach_thread.perform_concurrently {
tensor.parallel_insert_slice %7 into %o[%3] [%4] [1] : tensor<?xf32> into tensor<?xf32>		tensor.parallel_insert_slice %7 into %o[%3] [%4] [1] : tensor<?xf32> into tensor<?xf32>
}		}
}		}
// CHECK: }		// CHECK: }
func.return %2 : tensor<?xf32>		func.return %2 : tensor<?xf32>
}		}

▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	%2 = scf.foreach_thread (%i) in (%1) shared_outs(%o = %out_2) -> (tensor<?xf32>) {
%4 = affine.min #map2(%i)[%d0, %idx]		%4 = affine.min #map2(%i)[%d0, %idx]
%5 = tensor.extract_slice %o[%3] [%4] [1] : tensor<?xf32> to tensor<?xf32>		%5 = tensor.extract_slice %o[%3] [%4] [1] : tensor<?xf32> to tensor<?xf32>

// CHECK: %[[T0:.]] = tensor.extract_slice %[[IN]][%{{.}}] [%{{.}}] [{{.}}]		// CHECK: %[[T0:.]] = tensor.extract_slice %[[IN]][%{{.}}] [%{{.}}] [{{.}}]
// CHECK: %[[T1:.]]:2 = linalg.generic {{.}} ins(%[[T0]]		// CHECK: %[[T1:.]]:2 = linalg.generic {{.}} ins(%[[T0]]
%6 = tensor.extract_slice %0#0[%3] [%4] [1] : tensor<?xf32> to tensor<?xf32>		%6 = tensor.extract_slice %0#0[%3] [%4] [1] : tensor<?xf32> to tensor<?xf32>

// CHECK: %[[T2:.*]] = linalg.elemwise_unary ins(%[[T1]]#0		// CHECK: %[[T2:.*]] = linalg.elemwise_unary ins(%[[T1]]#0
%7 = linalg.elemwise_unary ins(%6 : tensor<?xf32>) outs(%5 : tensor<?xf32>) -> tensor<?xf32>		%7 = linalg.elemwise_unary ins(%6 : tensor<?xf32>) outs(%5 : tensor<?xf32>)
scf.foreach_thread.perform_concurrently {		scf.foreach_thread.perform_concurrently {
tensor.parallel_insert_slice %7 into %o[%3] [%4] [1] : tensor<?xf32> into tensor<?xf32>		tensor.parallel_insert_slice %7 into %o[%3] [%4] [1] : tensor<?xf32> into tensor<?xf32>
}		}
}		}
// CHECK: }		// CHECK: }
func.return %2 : tensor<?xf32>		func.return %2 : tensor<?xf32>
}		}

Show All 9 Lines

mlir/test/Dialect/Linalg/transform-op-fuse.mlir

	// RUN: mlir-opt %s --test-transform-dialect-interpreter --split-input-file -canonicalize \| FileCheck %s			// RUN: mlir-opt %s --test-transform-dialect-interpreter --split-input-file -canonicalize \| FileCheck %s

	// CHECK-LABEL: func.func @fuse_unary			// CHECK-LABEL: func.func @fuse_unary
	func.func @fuse_unary(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>) -> tensor<?x?xf32> {			func.func @fuse_unary(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>) -> tensor<?x?xf32> {

	// CHECK: %[[RES:.*]] = scf.for			// CHECK: %[[RES:.*]] = scf.for
	// CHECK: scf.for			// CHECK: scf.for
	// CHECK: linalg.elemwise_unary			// CHECK: linalg.elemwise_unary
	// CHECK: linalg.elemwise_binary			// CHECK: linalg.elemwise_binary
	// CHECK: return %[[RES]]			// CHECK: return %[[RES]]
	%0 = linalg.elemwise_unary ins(%arg0 : tensor<?x?xf32>)			%0 = linalg.elemwise_unary ins(%arg0 : tensor<?x?xf32>)
	outs(%arg1: tensor<?x?xf32>) -> tensor<?x?xf32>			outs(%arg1: tensor<?x?xf32>)
	%1 = linalg.elemwise_binary ins(%0, %arg0 : tensor<?x?xf32>, tensor<?x?xf32>)			%1 = linalg.elemwise_binary ins(%0, %arg0 : tensor<?x?xf32>, tensor<?x?xf32>)
	outs(%arg1: tensor<?x?xf32>) -> tensor<?x?xf32>			outs(%arg1: tensor<?x?xf32>)
	return %1 : tensor<?x?xf32>			return %1 : tensor<?x?xf32>
	}			}

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb1(%arg1: !pdl.operation):			^bb1(%arg1: !pdl.operation):
	%0 = transform.structured.match ops{["linalg.elemwise_binary"]} in %arg1			%0 = transform.structured.match ops{["linalg.elemwise_binary"]} in %arg1
	%1, %loops:2 = transform.structured.fuse %0 {tile_sizes = [32, 32], tile_interchange = [0, 1]}			%1, %loops:2 = transform.structured.fuse %0 {tile_sizes = [32, 32], tile_interchange = [0, 1]}
	}			}

	// -----			// -----

	// CHECK-LABEL: func.func @fuse_unary			// CHECK-LABEL: func.func @fuse_unary
	func.func @fuse_unary(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>) -> tensor<?x?xf32> {			func.func @fuse_unary(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>) -> tensor<?x?xf32> {

	// CHECK: %[[PARTIAL_RES:.*]] = scf.for			// CHECK: %[[PARTIAL_RES:.*]] = scf.for
	// CHECK: scf.for			// CHECK: scf.for
	// CHECK: linalg.elemwise_unary			// CHECK: linalg.elemwise_unary
	// CHECK: linalg.elemwise_binary			// CHECK: linalg.elemwise_binary
	// CHECK: %[[RES:.]] = scf.for {{.}}%[[PARTIAL_RES]]			// CHECK: %[[RES:.]] = scf.for {{.}}%[[PARTIAL_RES]]
	// CHECK: scf.for			// CHECK: scf.for
	// CHECK: linalg.elemwise_unary			// CHECK: linalg.elemwise_unary
	// CHECK: linalg.elemwise_binary			// CHECK: linalg.elemwise_binary
	// CHECK: return %[[RES]]			// CHECK: return %[[RES]]
	%0 = linalg.elemwise_unary ins(%arg0 : tensor<?x?xf32>)			%0 = linalg.elemwise_unary ins(%arg0 : tensor<?x?xf32>)
	outs(%arg1: tensor<?x?xf32>) -> tensor<?x?xf32>			outs(%arg1: tensor<?x?xf32>)
	%1 = linalg.elemwise_binary ins(%0, %arg0 : tensor<?x?xf32>, tensor<?x?xf32>)			%1 = linalg.elemwise_binary ins(%0, %arg0 : tensor<?x?xf32>, tensor<?x?xf32>)
	outs(%arg1: tensor<?x?xf32>) -> tensor<?x?xf32>			outs(%arg1: tensor<?x?xf32>)
	return %1 : tensor<?x?xf32>			return %1 : tensor<?x?xf32>
	}			}

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb1(%arg1: !pdl.operation):			^bb1(%arg1: !pdl.operation):
	%0 = transform.structured.match ops{["linalg.elemwise_binary"]} in %arg1			%0 = transform.structured.match ops{["linalg.elemwise_binary"]} in %arg1
	%1, %loops:2 = transform.structured.fuse %0 {tile_sizes = [32, 32], tile_interchange = [0, 1]}			%1, %loops:2 = transform.structured.fuse %0 {tile_sizes = [32, 32], tile_interchange = [0, 1]}
	%loop = transform.cast %loops#0 : !pdl.operation to !transform.op<"scf.for">			%loop = transform.cast %loops#0 : !pdl.operation to !transform.op<"scf.for">
	Show All 18 Lines
	// CHECK: %[[OUT_SLICE1:.+]] = tensor.extract_slice %[[FOR_ARG1]][%[[IV0]], %[[IV1]]]			// CHECK: %[[OUT_SLICE1:.+]] = tensor.extract_slice %[[FOR_ARG1]][%[[IV0]], %[[IV1]]]
	// CHECK: %[[FILL:.+]] = linalg.fill {{.+}} outs(%[[OUT_SLICE1]] : tensor<?x?xf32>)			// CHECK: %[[FILL:.+]] = linalg.fill {{.+}} outs(%[[OUT_SLICE1]] : tensor<?x?xf32>)
	// CHECK: scf.for %[[IV2:.+]] = %{{.+}} to %{{.+}} step %[[C4]] iter_args(%[[FOR_ARG2:.+]] = %[[FILL]])			// CHECK: scf.for %[[IV2:.+]] = %{{.+}} to %{{.+}} step %[[C4]] iter_args(%[[FOR_ARG2:.+]] = %[[FILL]])
	// CHECK: %[[IN_SLICE:.+]] = tensor.extract_slice %[[OUT_SLICE0]]			// CHECK: %[[IN_SLICE:.+]] = tensor.extract_slice %[[OUT_SLICE0]]
	// CHECK: %[[OUT_SLICE2:.+]] = tensor.extract_slice %[[FOR_ARG2]][0, 0]			// CHECK: %[[OUT_SLICE2:.+]] = tensor.extract_slice %[[FOR_ARG2]][0, 0]
	// CHECK: linalg.generic {{.+}} ins(%[[IN_SLICE]] : tensor<?x?x?xf32>) outs(%[[OUT_SLICE2]] : tensor<?x?xf32>)			// CHECK: linalg.generic {{.+}} ins(%[[IN_SLICE]] : tensor<?x?x?xf32>) outs(%[[OUT_SLICE2]] : tensor<?x?xf32>)
	// CHECK: return %[[RES]]			// CHECK: return %[[RES]]

	%fill = linalg.fill ins(%five : f32) outs(%init : tensor<12x25xf32>) -> tensor<12x25xf32>			%fill = linalg.fill ins(%five : f32) outs(%init : tensor<12x25xf32>)
	%0 = linalg.generic {			%0 = linalg.generic {
	indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d2)>],			indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d2)>],
	iterator_types = ["parallel", "reduction", "parallel"]			iterator_types = ["parallel", "reduction", "parallel"]
	} ins(%input : tensor<12x7x25xf32>) outs(%fill : tensor<12x25xf32>) {			} ins(%input : tensor<12x7x25xf32>) outs(%fill : tensor<12x25xf32>) {
	^bb0(%arg0: f32, %arg1: f32):			^bb0(%arg0: f32, %arg1: f32):
	%2 = arith.addf %arg0, %arg1 : f32			%2 = arith.addf %arg0, %arg1 : f32
	linalg.yield %2 : f32			linalg.yield %2 : f32
	} -> tensor<12x25xf32>			} -> tensor<12x25xf32>
	Show All 15 Lines
	// CHECK: tensor.unpack			// CHECK: tensor.unpack
	// CHECK: linalg.elemwise_unary			// CHECK: linalg.elemwise_unary
	// CHECK: return %[[RES]]			// CHECK: return %[[RES]]
	func.func @unpack_elemwise(%arg0: tensor<16x48x8x8xf32>, %arg1: tensor<128x384xf32>) -> tensor<128x384xf32> {			func.func @unpack_elemwise(%arg0: tensor<16x48x8x8xf32>, %arg1: tensor<128x384xf32>) -> tensor<128x384xf32> {
	%0 = tensor.empty() : tensor<128x384xf32>			%0 = tensor.empty() : tensor<128x384xf32>
	%1 = tensor.unpack %arg0 inner_dims_pos = [0, 1] inner_tiles = [8, 8] into %0			%1 = tensor.unpack %arg0 inner_dims_pos = [0, 1] inner_tiles = [8, 8] into %0
	: tensor<16x48x8x8xf32> -> tensor<128x384xf32>			: tensor<16x48x8x8xf32> -> tensor<128x384xf32>
	%2 = linalg.elemwise_unary ins(%1: tensor<128x384xf32>)			%2 = linalg.elemwise_unary ins(%1: tensor<128x384xf32>)
	outs(%arg1: tensor<128x384xf32>) -> tensor<128x384xf32>			outs(%arg1: tensor<128x384xf32>)
	return %2 : tensor<128x384xf32>			return %2 : tensor<128x384xf32>
	}			}

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb1(%arg1: !pdl.operation):			^bb1(%arg1: !pdl.operation):
	%0 = transform.structured.match ops{["linalg.elemwise_unary"]} in %arg1			%0 = transform.structured.match ops{["linalg.elemwise_unary"]} in %arg1
	%1, %loops:2 = transform.structured.fuse %0 {tile_sizes = [16, 32], tile_interchange = [0, 1]}			%1, %loops:2 = transform.structured.fuse %0 {tile_sizes = [16, 32], tile_interchange = [0, 1]}
	}			}

mlir/test/Dialect/Linalg/transform-op-generalize.mlir

	// RUN: mlir-opt --test-transform-dialect-interpreter %s \| FileCheck %s			// RUN: mlir-opt --test-transform-dialect-interpreter %s \| FileCheck %s

	// CHECK-LABEL: func.func @generalize_unary			// CHECK-LABEL: func.func @generalize_unary
	func.func @generalize_unary(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>) -> tensor<?x?xf32> {			func.func @generalize_unary(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>) -> tensor<?x?xf32> {

	// CHECK-NOT: linalg.elemwise_unary			// CHECK-NOT: linalg.elemwise_unary
	// CHECK: linalg.generic			// CHECK: linalg.generic
	%0 = linalg.elemwise_unary ins(%arg0 : tensor<?x?xf32>)			%0 = linalg.elemwise_unary ins(%arg0 : tensor<?x?xf32>)
	outs(%arg1: tensor<?x?xf32>) -> tensor<?x?xf32>			outs(%arg1: tensor<?x?xf32>)
	return %0 : tensor<?x?xf32>			return %0 : tensor<?x?xf32>
	}			}

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb1(%arg1: !pdl.operation):			^bb1(%arg1: !pdl.operation):
	%0 = transform.structured.match ops{["linalg.elemwise_unary"]} in %arg1			%0 = transform.structured.match ops{["linalg.elemwise_unary"]} in %arg1
	%1 = transform.structured.generalize %0			%1 = transform.structured.generalize %0
	}			}

mlir/test/Dialect/Linalg/transform-op-interchange.mlir

Show All 22 Lines	^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.generic"]} in %arg1		%0 = transform.structured.match ops{["linalg.generic"]} in %arg1
transform.structured.interchange %0 iterator_interchange = [1, 0]		transform.structured.interchange %0 iterator_interchange = [1, 0]
}		}

// -----		// -----

func.func @interchange_matmul(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>) -> tensor<?x?xf32> {		func.func @interchange_matmul(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>) -> tensor<?x?xf32> {
// expected-note @below {{when applied to this op}}		// expected-note @below {{when applied to this op}}
%0 = linalg.matmul ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%arg2 : tensor<?x?xf32>) -> tensor<?x?xf32>		%0 = linalg.matmul ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%arg2 : tensor<?x?xf32>)
return %0 : tensor<?x?xf32>		return %0 : tensor<?x?xf32>
}		}

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb1(%arg1: !pdl.operation):		^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1		%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1
// expected-error @below {{transform applied to the wrong op kind}}		// expected-error @below {{transform applied to the wrong op kind}}
transform.structured.interchange %0 iterator_interchange = [1, 0]		transform.structured.interchange %0 iterator_interchange = [1, 0]
}		}

mlir/test/Dialect/Linalg/transform-op-multitile-sizes.mlir

// RUN: mlir-opt %s --test-transform-dialect-interpreter --split-input-file \| FileCheck %s		// RUN: mlir-opt %s --test-transform-dialect-interpreter --split-input-file \| FileCheck %s

// CHECK-DAG: #[[$MAP13:.+]] = affine_map<() -> (13)>		// CHECK-DAG: #[[$MAP13:.+]] = affine_map<() -> (13)>

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb0(%arg1: !pdl.operation):		^bb0(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1		%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1
transform.structured.multitile_sizes %0 { target_size = 3, dimension = 0 }		transform.structured.multitile_sizes %0 { target_size = 3, dimension = 0 }
}		}

// CHECK-LABEL: @multitile_sizes_static		// CHECK-LABEL: @multitile_sizes_static
func.func @multitile_sizes_static(		func.func @multitile_sizes_static(
%arg0: tensor<13x34xf32>, %arg1: tensor<34x42xf32>, %arg2: tensor<13x42xf32>)		%arg0: tensor<13x34xf32>, %arg1: tensor<34x42xf32>, %arg2: tensor<13x42xf32>)
-> tensor<13x42xf32> {		-> tensor<13x42xf32> {
%0 = linalg.matmul ins(%arg0, %arg1: tensor<13x34xf32>, tensor<34x42xf32>)		%0 = linalg.matmul ins(%arg0, %arg1: tensor<13x34xf32>, tensor<34x42xf32>)
outs(%arg2: tensor<13x42xf32>)		outs(%arg2: tensor<13x42xf32>)
-> tensor<13x42xf32>
// The first application computes the total size.		// The first application computes the total size.
// CHECK: %{{.*}} = affine.apply #[[$MAP13]]()		// CHECK: %{{.*}} = affine.apply #[[$MAP13]]()
// CHECK: %[[SIZE:.+]] = affine.apply #[[$MAP13]]()		// CHECK: %[[SIZE:.+]] = affine.apply #[[$MAP13]]()
// CHECK: %[[COND:.+]] = arith.cmpi eq, %[[SIZE]], %{{.*}}		// CHECK: %[[COND:.+]] = arith.cmpi eq, %[[SIZE]], %{{.*}}
// CHECK: cf.assert %[[COND]], "could not compute dynamic multi-size tile shapes"		// CHECK: cf.assert %[[COND]], "could not compute dynamic multi-size tile shapes"

return %0 : tensor<13x42xf32>		return %0 : tensor<13x42xf32>
}		}
Show All 29 Lines	func.func @multitile_sizes_dynamic(
// CHECK: affine.apply #[[$MAP_D]]()[%[[DIM]]]		// CHECK: affine.apply #[[$MAP_D]]()[%[[DIM]]]
// CHECK: affine.apply #[[$MAP_S]]()[%[[DIM]]]		// CHECK: affine.apply #[[$MAP_S]]()[%[[DIM]]]
// CHECK: affine.apply #[[$MAP_V]]()[%[[DIM]]]		// CHECK: affine.apply #[[$MAP_V]]()[%[[DIM]]]
// CHECK: affine.apply #[[$MAP_U]]()[%[[DIM]]]		// CHECK: affine.apply #[[$MAP_U]]()[%[[DIM]]]
%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>)		%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>)
-> tensor<?x?xf32> {		-> tensor<?x?xf32> {
%0 = linalg.matmul ins(%arg0, %arg1: tensor<?x?xf32>, tensor<?x?xf32>)		%0 = linalg.matmul ins(%arg0, %arg1: tensor<?x?xf32>, tensor<?x?xf32>)
outs(%arg2: tensor<?x?xf32>)		outs(%arg2: tensor<?x?xf32>)
-> tensor<?x?xf32>

return %0 : tensor<?x?xf32>		return %0 : tensor<?x?xf32>
}		}

mlir/test/Dialect/Linalg/transform-op-pad.mlir

Show All 20 Lines	func.func @static_sizes_output_divisible(%arg0: tensor<24x12xf32>,

// CHECK: %[[T3:.*]] = tensor.pad %[[T0]] nofold		// CHECK: %[[T3:.*]] = tensor.pad %[[T0]] nofold
// CHECK: tensor.yield %[[CST]]		// CHECK: tensor.yield %[[CST]]
// CHECK: %[[T4:.*]] = tensor.pad %[[T1]] nofold		// CHECK: %[[T4:.*]] = tensor.pad %[[T1]] nofold

// CHECK: %[[T5:.*]] = linalg.matmul		// CHECK: %[[T5:.*]] = linalg.matmul
// CHECK-SAME: ins(%[[T3]], %[[T4]] : tensor<4x7xf32>, tensor<7x5xf32>)		// CHECK-SAME: ins(%[[T3]], %[[T4]] : tensor<4x7xf32>, tensor<7x5xf32>)
// CHECK-SAME: outs(%[[T2]] : tensor<4x5xf32>)		// CHECK-SAME: outs(%[[T2]] : tensor<4x5xf32>)
%4 = linalg.matmul ins(%1, %2 : tensor<4x?xf32>, tensor<?x5xf32>) outs(%3 : tensor<4x5xf32>) -> tensor<4x5xf32>		%4 = linalg.matmul ins(%1, %2 : tensor<4x?xf32>, tensor<?x5xf32>) outs(%3 : tensor<4x5xf32>)
%5 = tensor.insert_slice %4 into %arg2[%iv0, %iv1] [4, 5] [1, 1] : tensor<4x5xf32> into tensor<24x25xf32>		%5 = tensor.insert_slice %4 into %arg2[%iv0, %iv1] [4, 5] [1, 1] : tensor<4x5xf32> into tensor<24x25xf32>
func.return %5 : tensor<24x25xf32>		func.return %5 : tensor<24x25xf32>
}		}

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb1(%arg1: !pdl.operation):		^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1		%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1
%1 = transform.structured.pad %0 {padding_values=[0.0 : f32, 0.0 : f32, 0.0 : f32], padding_dimensions=[0, 1, 2], pack_paddings=[1, 1, 0]}		%1 = transform.structured.pad %0 {padding_values=[0.0 : f32, 0.0 : f32, 0.0 : f32], padding_dimensions=[0, 1, 2], pack_paddings=[1, 1, 0]}
}		}

// -----		// -----

func.func @pad(%arg0: tensor<24x12xf32>,		func.func @pad(%arg0: tensor<24x12xf32>,
%arg1: tensor<12x25xf32>,		%arg1: tensor<12x25xf32>,
%arg2: tensor<24x25xf32>) -> tensor<24x25xf32> {		%arg2: tensor<24x25xf32>) -> tensor<24x25xf32> {
// expected-note @below {{when applied to this op}}		// expected-note @below {{when applied to this op}}
%0 = linalg.matmul ins(%arg0, %arg1 : tensor<24x12xf32>, tensor<12x25xf32>) outs(%arg2 : tensor<24x25xf32>) -> tensor<24x25xf32>		%0 = linalg.matmul ins(%arg0, %arg1 : tensor<24x12xf32>, tensor<12x25xf32>) outs(%arg2 : tensor<24x25xf32>)
func.return %0 : tensor<24x25xf32>		func.return %0 : tensor<24x25xf32>
}		}

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb1(%arg1: !pdl.operation):		^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1		%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1
// expected-error @below {{op expects a padding value of type 'f32', got 0 : i32}}		// expected-error @below {{op expects a padding value of type 'f32', got 0 : i32}}
%1 = transform.structured.pad %0 {padding_values=[0: i32, 0.0 : f32, 0.0 : f32], padding_dimensions=[0, 1, 2], pack_paddings=[1, 1, 0]}		%1 = transform.structured.pad %0 {padding_values=[0: i32, 0.0 : f32, 0.0 : f32], padding_dimensions=[0, 1, 2], pack_paddings=[1, 1, 0]}
}		}

// -----		// -----

func.func @pad(%arg0: tensor<24x12xf32>,		func.func @pad(%arg0: tensor<24x12xf32>,
%arg1: tensor<12x25xf32>,		%arg1: tensor<12x25xf32>,
%arg2: tensor<24x25xf32>) -> tensor<24x25xf32> {		%arg2: tensor<24x25xf32>) -> tensor<24x25xf32> {
// expected-note @below {{when applied to this op}}		// expected-note @below {{when applied to this op}}
%0 = linalg.matmul ins(%arg0, %arg1 : tensor<24x12xf32>, tensor<12x25xf32>) outs(%arg2 : tensor<24x25xf32>) -> tensor<24x25xf32>		%0 = linalg.matmul ins(%arg0, %arg1 : tensor<24x12xf32>, tensor<12x25xf32>) outs(%arg2 : tensor<24x25xf32>)
func.return %0 : tensor<24x25xf32>		func.return %0 : tensor<24x25xf32>
}		}

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb1(%arg1: !pdl.operation):		^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1		%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1
// expected-error @below {{expects a padding that parses to 'f32', got "foo"}}		// expected-error @below {{expects a padding that parses to 'f32', got "foo"}}
%1 = transform.structured.pad %0 {padding_values=["foo", 0.0 : f32, 0.0 : f32], padding_dimensions=[0, 1, 2], pack_paddings=[1, 1, 0]}		%1 = transform.structured.pad %0 {padding_values=["foo", 0.0 : f32, 0.0 : f32], padding_dimensions=[0, 1, 2], pack_paddings=[1, 1, 0]}
}		}

// -----		// -----

func.func @pad(%arg0: tensor<24x12xf32>,		func.func @pad(%arg0: tensor<24x12xf32>,
%arg1: tensor<12x25xf32>,		%arg1: tensor<12x25xf32>,
%arg2: tensor<24x25xf32>) -> tensor<24x25xf32> {		%arg2: tensor<24x25xf32>) -> tensor<24x25xf32> {
// This is attached to an error that is silenceable and is not reported by this transform		// This is attached to an error that is silenceable and is not reported by this transform
// {{when applied to this op}}		// {{when applied to this op}}
%0 = linalg.matmul ins(%arg0, %arg1 : tensor<24x12xf32>, tensor<12x25xf32>) outs(%arg2 : tensor<24x25xf32>) -> tensor<24x25xf32>		%0 = linalg.matmul ins(%arg0, %arg1 : tensor<24x12xf32>, tensor<12x25xf32>) outs(%arg2 : tensor<24x25xf32>)
func.return %0 : tensor<24x25xf32>		func.return %0 : tensor<24x25xf32>
}		}

transform.sequence failures(suppress) {		transform.sequence failures(suppress) {
^bb1(%arg1: !pdl.operation):		^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1		%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1
// This error is silenceable and is not reported by this transform		// This error is silenceable and is not reported by this transform
// {{transform.structured.pad failed to apply}}		// {{transform.structured.pad failed to apply}}
%1 = transform.structured.pad %0 {padding_values=[0.0 : f32, 0.0 : f32, 0.0 : f32], padding_dimensions=[0, 1, 2], pack_paddings=[1, 1, 0]}		%1 = transform.structured.pad %0 {padding_values=[0.0 : f32, 0.0 : f32, 0.0 : f32], padding_dimensions=[0, 1, 2], pack_paddings=[1, 1, 0]}
}		}

mlir/test/Dialect/Linalg/transform-op-scalarize.mlir

	// RUN: mlir-opt -test-transform-dialect-interpreter %s \| FileCheck %s			// RUN: mlir-opt -test-transform-dialect-interpreter %s \| FileCheck %s

	func.func @scalarize(%arg0: tensor<24x12xf32>,			func.func @scalarize(%arg0: tensor<24x12xf32>,
	%arg1: tensor<12x25xf32>,			%arg1: tensor<12x25xf32>,
	%arg2: tensor<24x25xf32>) -> tensor<24x25xf32> {			%arg2: tensor<24x25xf32>) -> tensor<24x25xf32> {
	// The op is first tiled by 10 in the first dimension, which creates a			// The op is first tiled by 10 in the first dimension, which creates a
	// dynamic size, and then scalarized, which brings the dimension to static 1.			// dynamic size, and then scalarized, which brings the dimension to static 1.
	// CHECK: %[[RES_LOOP_1:.]] = scf.for {{.}} -> (tensor<24x25xf32>)			// CHECK: %[[RES_LOOP_1:.]] = scf.for {{.}} -> (tensor<24x25xf32>)
	// CHECK: %[[RES_LOOP_2:.]] = scf.for {{.}} -> (tensor<?x25xf32>)			// CHECK: %[[RES_LOOP_2:.]] = scf.for {{.}} -> (tensor<?x25xf32>)
	// CHECK: %[[MM:.]] = linalg.matmul ins(%{{.}}, %{{.*}} : tensor<1x12			// CHECK: %[[MM:.]] = linalg.matmul ins(%{{.}}, %{{.*}} : tensor<1x12
	// CHECK: %[[INS_2:.]] = tensor.insert_slice %[[MM]] into %{{.}} [1, 25] [1, 1] : tensor<1x25xf32> into tensor<?x25xf32>			// CHECK: %[[INS_2:.]] = tensor.insert_slice %[[MM]] into %{{.}} [1, 25] [1, 1] : tensor<1x25xf32> into tensor<?x25xf32>
	// CHECK: scf.yield %[[INS_2]] : tensor<?x25xf32>			// CHECK: scf.yield %[[INS_2]] : tensor<?x25xf32>
	// CHECK: %[[INS_1:.]] = tensor.insert_slice %[[RES_LOOP_2]] into %{{.}}, 25] [1, 1] : tensor<?x25xf32> into tensor<24x25xf32>			// CHECK: %[[INS_1:.]] = tensor.insert_slice %[[RES_LOOP_2]] into %{{.}}, 25] [1, 1] : tensor<?x25xf32> into tensor<24x25xf32>
	// CHECK: scf.yield %[[INS_1]] : tensor<24x25xf32>			// CHECK: scf.yield %[[INS_1]] : tensor<24x25xf32>
	%0 = linalg.matmul ins(%arg0, %arg1 : tensor<24x12xf32>, tensor<12x25xf32>) outs(%arg2 : tensor<24x25xf32>) -> tensor<24x25xf32>			%0 = linalg.matmul ins(%arg0, %arg1 : tensor<24x12xf32>, tensor<12x25xf32>) outs(%arg2 : tensor<24x25xf32>)

	// CHECK: return %[[RES_LOOP_1]] : tensor<24x25xf32>			// CHECK: return %[[RES_LOOP_1]] : tensor<24x25xf32>
	func.return %0 : tensor<24x25xf32>			func.return %0 : tensor<24x25xf32>
	}			}

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb1(%arg1: !pdl.operation):			^bb1(%arg1: !pdl.operation):
	%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1			%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1
	%1, %loops = transform.structured.tile %0 [10, 0, 0]			%1, %loops = transform.structured.tile %0 [10, 0, 0]
	%2 = transform.structured.scalarize %1			%2 = transform.structured.scalarize %1
	}			}

mlir/test/Dialect/Linalg/transform-op-split-reduction-by-scaling.mlir

	// RUN: mlir-opt --test-transform-dialect-interpreter %s \| FileCheck %s			// RUN: mlir-opt --test-transform-dialect-interpreter %s \| FileCheck %s

	// CHECK-LABEL: func.func @matmul_split			// CHECK-LABEL: func.func @matmul_split
	func.func @matmul_split(%A : tensor<?x256xf32>, %B: tensor<256x32xf32>, %C: tensor<?x32xf32>) -> tensor<?x32xf32> {			func.func @matmul_split(%A : tensor<?x256xf32>, %B: tensor<256x32xf32>, %C: tensor<?x32xf32>) -> tensor<?x32xf32> {

	// CHECK: bufferization.alloc_tensor({{.*}}) : tensor<?x32x64xf32>			// CHECK: bufferization.alloc_tensor({{.*}}) : tensor<?x32x64xf32>
	// CHECK: linalg.generic			// CHECK: linalg.generic
	// CHECK-SAME: iterator_types = ["parallel", "parallel", "parallel", "reduction"]			// CHECK-SAME: iterator_types = ["parallel", "parallel", "parallel", "reduction"]
	// CHECK-SAME: ins(%{{[a-zA-Z0-9]}}, %{{[a-zA-Z0-9]}}, %{{[a-zA-Z0-9]*}} : tensor<?x256xf32>, tensor<256x32xf32>, tensor<64x4xi1>)			// CHECK-SAME: ins(%{{[a-zA-Z0-9]}}, %{{[a-zA-Z0-9]}}, %{{[a-zA-Z0-9]*}} : tensor<?x256xf32>, tensor<256x32xf32>, tensor<64x4xi1>)
	// CHECK-SAME: outs(%{{[a-zA-Z0-9]*}} : tensor<?x32x64xf32>) {			// CHECK-SAME: outs(%{{[a-zA-Z0-9]*}} : tensor<?x32x64xf32>) {

	// CHECK: linalg.generic			// CHECK: linalg.generic
	// CHECK-SAME: iterator_types = ["parallel", "parallel", "reduction"]			// CHECK-SAME: iterator_types = ["parallel", "parallel", "reduction"]
	// CHECK-SAME: ins(%{{[a-zA-Z0-9]*}} : tensor<?x32x64xf32>)			// CHECK-SAME: ins(%{{[a-zA-Z0-9]*}} : tensor<?x32x64xf32>)
	// CHECK-SAME: outs(%{{[a-zA-Z0-9]*}} : tensor<?x32xf32>) {			// CHECK-SAME: outs(%{{[a-zA-Z0-9]*}} : tensor<?x32xf32>) {
	%0 = linalg.matmul ins(%A, %B: tensor<?x256xf32>, tensor<256x32xf32>)			%0 = linalg.matmul ins(%A, %B: tensor<?x256xf32>, tensor<256x32xf32>)
	outs(%C: tensor<?x32xf32>) -> tensor<?x32xf32>			outs(%C: tensor<?x32xf32>)
	return %0: tensor<?x32xf32>			return %0: tensor<?x32xf32>
	}			}

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb1(%arg1: !pdl.operation):			^bb1(%arg1: !pdl.operation):
	%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1			%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1
	%1:4 = transform.structured.split_reduction %0			%1:4 = transform.structured.split_reduction %0
	{ split_factor = 4, insert_split_dimension = 2, use_scaling_algorithm, use_alloc}			{ split_factor = 4, insert_split_dimension = 2, use_scaling_algorithm, use_alloc}
	}			}

mlir/test/Dialect/Linalg/transform-op-split-reduction.mlir

	// RUN: mlir-opt --split-input-file --test-transform-dialect-interpreter %s \| FileCheck %s			// RUN: mlir-opt --split-input-file --test-transform-dialect-interpreter %s \| FileCheck %s

	func.func @matmul_split(%A : tensor<16x256xf32>, %B: tensor<256x32xf32>, %C: tensor<16x32xf32>) -> tensor<16x32xf32> {			func.func @matmul_split(%A : tensor<16x256xf32>, %B: tensor<256x32xf32>, %C: tensor<16x32xf32>) -> tensor<16x32xf32> {
	%0 = linalg.matmul ins(%A, %B: tensor<16x256xf32>, tensor<256x32xf32>)			%0 = linalg.matmul ins(%A, %B: tensor<16x256xf32>, tensor<256x32xf32>)
	outs(%C: tensor<16x32xf32>) -> tensor<16x32xf32>			outs(%C: tensor<16x32xf32>)
	return %0: tensor<16x32xf32>			return %0: tensor<16x32xf32>
	}			}

	// CHECK-DAG: #[[$MAP0:.*]] = affine_map<(d0, d1, d2, d3) -> (d0, d2, d3)>			// CHECK-DAG: #[[$MAP0:.*]] = affine_map<(d0, d1, d2, d3) -> (d0, d2, d3)>
	// CHECK-DAG: #[[$MAP1:.*]] = affine_map<(d0, d1, d2, d3) -> (d2, d3, d1)>			// CHECK-DAG: #[[$MAP1:.*]] = affine_map<(d0, d1, d2, d3) -> (d2, d3, d1)>
	// CHECK-DAG: #[[$MAP2:.*]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>			// CHECK-DAG: #[[$MAP2:.*]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>
	// CHECK-DAG: #[[$MAP3:.*]] = affine_map<(d0, d1, d2) -> (d0, d1, d2)>			// CHECK-DAG: #[[$MAP3:.*]] = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
	// CHECK-DAG: #[[$MAP4:.*]] = affine_map<(d0, d1, d2) -> (d0, d1)>			// CHECK-DAG: #[[$MAP4:.*]] = affine_map<(d0, d1, d2) -> (d0, d1)>
	// CHECK-LABEL: @matmul_split			// CHECK-LABEL: @matmul_split
	// CHECK-DAG: %[[ID:.*]] = arith.constant 0.000000e+00 : f32			// CHECK-DAG: %[[ID:.*]] = arith.constant 0.000000e+00 : f32
	// CHECK-DAG: %[[I1:.]] = tensor.expand_shape %{{.}}[0], [1, 2]] : tensor<16x256xf32> into tensor<16x4x64xf32>			// CHECK-DAG: %[[I1:.]] = tensor.expand_shape %{{.}}[0], [1, 2]] : tensor<16x256xf32> into tensor<16x4x64xf32>
	// CHECK-DAG: %[[I2:.]] = tensor.expand_shape %{{.}}[0, 1], [2]] : tensor<256x32xf32> into tensor<4x64x32xf32>			// CHECK-DAG: %[[I2:.]] = tensor.expand_shape %{{.}}[0, 1], [2]] : tensor<256x32xf32> into tensor<4x64x32xf32>
	// CHECK-DAG: %[[INI:.*]] = tensor.empty() : tensor<16x32x4xf32>			// CHECK-DAG: %[[INI:.*]] = tensor.empty() : tensor<16x32x4xf32>
	// CHECK: %[[F:.*]] = linalg.fill ins(%[[ID]] : f32) outs(%[[INI]] : tensor<16x32x4xf32>) -> tensor<16x32x4xf32>			// CHECK: %[[F:.*]] = linalg.fill ins(%[[ID]] : f32) outs(%[[INI]] : tensor<16x32x4xf32>)
	// CHECK: %[[G:.*]] = linalg.generic {indexing_maps = [#[[$MAP0]], #[[$MAP1]], #[[$MAP2]]]			// CHECK: %[[G:.*]] = linalg.generic {indexing_maps = [#[[$MAP0]], #[[$MAP1]], #[[$MAP2]]]
	// CHECK-SAME: , iterator_types = ["parallel", "parallel", "parallel", "reduction"]}			// CHECK-SAME: , iterator_types = ["parallel", "parallel", "parallel", "reduction"]}
	// CHECK-SAME: ins(%[[I1]], %[[I2]] : tensor<16x4x64xf32>, tensor<4x64x32xf32>) outs(%[[F]] : tensor<16x32x4xf32>) {			// CHECK-SAME: ins(%[[I1]], %[[I2]] : tensor<16x4x64xf32>, tensor<4x64x32xf32>) outs(%[[F]] : tensor<16x32x4xf32>) {
	// CHECK: arith.mulf			// CHECK: arith.mulf
	// CHECK: arith.addf			// CHECK: arith.addf
	// CHECK: linalg.yield			// CHECK: linalg.yield
	// CHECK: } -> tensor<16x32x4xf32>			// CHECK: } -> tensor<16x32x4xf32>
	// CHECK: %[[R:.*]] = linalg.generic {indexing_maps = [#[[$MAP3]], #[[$MAP4]]],			// CHECK: %[[R:.*]] = linalg.generic {indexing_maps = [#[[$MAP3]], #[[$MAP4]]],
	Show All 31 Lines
	// CHECK-DAG: #[[$MAP1:.*]] = affine_map<(d0, d1) -> ()>			// CHECK-DAG: #[[$MAP1:.*]] = affine_map<(d0, d1) -> ()>
	// CHECK-DAG: #[[$MAP2:.*]] = affine_map<(d0, d1) -> (d0)>			// CHECK-DAG: #[[$MAP2:.*]] = affine_map<(d0, d1) -> (d0)>
	// CHECK-DAG: #[[$MAP3:.*]] = affine_map<(d0) -> (d0)>			// CHECK-DAG: #[[$MAP3:.*]] = affine_map<(d0) -> (d0)>
	// CHECK-DAG: #[[$MAP4:.*]] = affine_map<(d0) -> ()>			// CHECK-DAG: #[[$MAP4:.*]] = affine_map<(d0) -> ()>
	//CHECK-LABEL: @generic_split_1d			//CHECK-LABEL: @generic_split_1d
	// CHECK-DAG: %[[ID:.*]] = arith.constant 1.000000e+00 : f32			// CHECK-DAG: %[[ID:.*]] = arith.constant 1.000000e+00 : f32
	// CHECK-DAG: %[[I1:.]] = tensor.expand_shape %{{.}}[0, 1]] : tensor<32xf32> into tensor<4x8xf32>			// CHECK-DAG: %[[I1:.]] = tensor.expand_shape %{{.}}[0, 1]] : tensor<32xf32> into tensor<4x8xf32>
	// CHECK-DAG: %[[INI:.*]] = tensor.empty() : tensor<4xf32>			// CHECK-DAG: %[[INI:.*]] = tensor.empty() : tensor<4xf32>
	// CHECK: %[[F:.*]] = linalg.fill ins(%[[ID]] : f32) outs(%[[INI]] : tensor<4xf32>) -> tensor<4xf32>			// CHECK: %[[F:.*]] = linalg.fill ins(%[[ID]] : f32) outs(%[[INI]] : tensor<4xf32>)
	// CHECK: %[[G:.*]] = linalg.generic			// CHECK: %[[G:.*]] = linalg.generic
	// CHECK: {indexing_maps = [#[[$MAP0]], #[[$MAP1]], #[[$MAP2]]],			// CHECK: {indexing_maps = [#[[$MAP0]], #[[$MAP1]], #[[$MAP2]]],
	// CHECK: iterator_types = ["parallel", "reduction"]} ins(%[[I1]], %{{.*}} : tensor<4x8xf32>, tensor<f32>) outs(%[[F]] : tensor<4xf32>) {			// CHECK: iterator_types = ["parallel", "reduction"]} ins(%[[I1]], %{{.*}} : tensor<4x8xf32>, tensor<f32>) outs(%[[F]] : tensor<4xf32>) {
	// CHECK: arith.subf			// CHECK: arith.subf
	// CHECK: math.exp			// CHECK: math.exp
	// CHECK: arith.mulf			// CHECK: arith.mulf
	// CHECK: linalg.yield			// CHECK: linalg.yield
	// CHECK: } -> tensor<4xf32>			// CHECK: } -> tensor<4xf32>
	Show All 35 Lines
	// CHECK-DAG: #[[$MAP2:.*]] = affine_map<(d0, d1, d2, d3) -> (d3, d0, d2)>			// CHECK-DAG: #[[$MAP2:.*]] = affine_map<(d0, d1, d2, d3) -> (d3, d0, d2)>
	// CHECK-DAG: #[[$MAP3:.*]] = affine_map<(d0, d1, d2) -> (d0, d1, d2)>			// CHECK-DAG: #[[$MAP3:.*]] = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
	// CHECK-DAG: #[[$MAP4:.*]] = affine_map<(d0, d1, d2) -> (d0, d1)>			// CHECK-DAG: #[[$MAP4:.*]] = affine_map<(d0, d1, d2) -> (d0, d1)>
	// CHECK-LABEL: func @generic_split_3d			// CHECK-LABEL: func @generic_split_3d
	// CHECK-DAG: %[[ID:.*]] = arith.constant 0xFF800000 : f32			// CHECK-DAG: %[[ID:.*]] = arith.constant 0xFF800000 : f32
	// CHECK-DAG: %[[I1:.]] = tensor.expand_shape %{{.}}[0, 1], [2]] : tensor<32x2xf32> into tensor<4x8x2xf32>			// CHECK-DAG: %[[I1:.]] = tensor.expand_shape %{{.}}[0, 1], [2]] : tensor<32x2xf32> into tensor<4x8x2xf32>
	// CHECK-DAG: %[[I2:.]] = tensor.expand_shape %{{.}}[0], [1, 2]] : tensor<5x32xf32> into tensor<5x4x8xf32>			// CHECK-DAG: %[[I2:.]] = tensor.expand_shape %{{.}}[0], [1, 2]] : tensor<5x32xf32> into tensor<5x4x8xf32>
	// CHECK-DAG: %[[INI:.*]] = tensor.empty() : tensor<5x2x4xf32>			// CHECK-DAG: %[[INI:.*]] = tensor.empty() : tensor<5x2x4xf32>
	// CHECK: %[[F:.*]] = linalg.fill ins(%[[ID]] : f32) outs(%[[INI]] : tensor<5x2x4xf32>) -> tensor<5x2x4xf32>			// CHECK: %[[F:.*]] = linalg.fill ins(%[[ID]] : f32) outs(%[[INI]] : tensor<5x2x4xf32>)
	// CHECK: %[[G:.*]] = linalg.generic {indexing_maps = [#[[$MAP0]], #[[$MAP1]], #[[$MAP2]]], iterator_types = ["parallel", "reduction", "parallel", "parallel"]}			// CHECK: %[[G:.*]] = linalg.generic {indexing_maps = [#[[$MAP0]], #[[$MAP1]], #[[$MAP2]]], iterator_types = ["parallel", "reduction", "parallel", "parallel"]}
	// CHECK-SAME: ins(%[[I1]], %[[I2]] : tensor<4x8x2xf32>, tensor<5x4x8xf32>) outs(%[[F]] : tensor<5x2x4xf32>) {			// CHECK-SAME: ins(%[[I1]], %[[I2]] : tensor<4x8x2xf32>, tensor<5x4x8xf32>) outs(%[[F]] : tensor<5x2x4xf32>) {
	// CHECK: arith.addf			// CHECK: arith.addf
	// CHECK: arith.maxf			// CHECK: arith.maxf
	// CHECK: linalg.yield			// CHECK: linalg.yield
	// CHECK: } -> tensor<5x2x4xf32>			// CHECK: } -> tensor<5x2x4xf32>
	// CHECK: %[[R:.*]] = linalg.generic {indexing_maps = [#[[$MAP3]], #[[$MAP4]]], iterator_types = ["parallel", "parallel", "reduction"]}			// CHECK: %[[R:.*]] = linalg.generic {indexing_maps = [#[[$MAP3]], #[[$MAP4]]], iterator_types = ["parallel", "parallel", "reduction"]}
	// CHECK-SAME: ins(%[[G]] : tensor<5x2x4xf32>) outs(%{{.*}} : tensor<5x2xf32>) {			// CHECK-SAME: ins(%[[G]] : tensor<5x2x4xf32>) outs(%{{.*}} : tensor<5x2xf32>) {
	// CHECK: arith.maxf			// CHECK: arith.maxf
	// CHECK: linalg.yield			// CHECK: linalg.yield
	// CHECK: } -> tensor<5x2xf32>			// CHECK: } -> tensor<5x2xf32>
	// CHECK: return %[[R]] : tensor<5x2xf32>			// CHECK: return %[[R]] : tensor<5x2xf32>

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb1(%arg1: !pdl.operation):			^bb1(%arg1: !pdl.operation):
	%0 = transform.structured.match ops{["linalg.generic"]} in %arg1			%0 = transform.structured.match ops{["linalg.generic"]} in %arg1
	%1:4 = transform.structured.split_reduction %0 { split_factor = 4, insert_split_dimension = 2}			%1:4 = transform.structured.split_reduction %0 { split_factor = 4, insert_split_dimension = 2}
	}			}

	// -----			// -----

	func.func @matmul_split(%A : tensor<16x256xf32>, %B: tensor<256x32xf32>, %C: tensor<16x32xf32>) -> tensor<16x32xf32> {			func.func @matmul_split(%A : tensor<16x256xf32>, %B: tensor<256x32xf32>, %C: tensor<16x32xf32>) -> tensor<16x32xf32> {
	%0 = linalg.matmul ins(%A, %B: tensor<16x256xf32>, tensor<256x32xf32>)			%0 = linalg.matmul ins(%A, %B: tensor<16x256xf32>, tensor<256x32xf32>)
	outs(%C: tensor<16x32xf32>) -> tensor<16x32xf32>			outs(%C: tensor<16x32xf32>)
	return %0: tensor<16x32xf32>			return %0: tensor<16x32xf32>
	}			}

	// CHECK-DAG: #[[$MAP0:.*]] = affine_map<(d0, d1, d2, d3) -> (d0, d2, d3)>			// CHECK-DAG: #[[$MAP0:.*]] = affine_map<(d0, d1, d2, d3) -> (d0, d2, d3)>
	// CHECK-DAG: #[[$MAP1:.*]] = affine_map<(d0, d1, d2, d3) -> (d2, d3, d1)>			// CHECK-DAG: #[[$MAP1:.*]] = affine_map<(d0, d1, d2, d3) -> (d2, d3, d1)>
	// CHECK-DAG: #[[$MAP2:.*]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>			// CHECK-DAG: #[[$MAP2:.*]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>
	// CHECK-DAG: #[[$MAP3:.*]] = affine_map<(d0, d1, d2) -> (d0, d1, d2)>			// CHECK-DAG: #[[$MAP3:.*]] = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
	// CHECK-DAG: #[[$MAP4:.*]] = affine_map<(d0, d1, d2) -> (d0, d1)>			// CHECK-DAG: #[[$MAP4:.*]] = affine_map<(d0, d1, d2) -> (d0, d1)>
	// CHECK-LABEL: @matmul_split			// CHECK-LABEL: @matmul_split
	// CHECK-DAG: %[[ID:.*]] = arith.constant 0.000000e+00 : f32			// CHECK-DAG: %[[ID:.*]] = arith.constant 0.000000e+00 : f32
	// CHECK-DAG: %[[I1:.]] = tensor.expand_shape %{{.}}[0], [1, 2]] : tensor<16x256xf32> into tensor<16x64x4xf32>			// CHECK-DAG: %[[I1:.]] = tensor.expand_shape %{{.}}[0], [1, 2]] : tensor<16x256xf32> into tensor<16x64x4xf32>
	// CHECK-DAG: %[[I2:.]] = tensor.expand_shape %{{.}}[0, 1], [2]] : tensor<256x32xf32> into tensor<64x4x32xf32>			// CHECK-DAG: %[[I2:.]] = tensor.expand_shape %{{.}}[0, 1], [2]] : tensor<256x32xf32> into tensor<64x4x32xf32>
	// CHECK-DAG: %[[INI:.*]] = tensor.empty() : tensor<16x32x4xf32>			// CHECK-DAG: %[[INI:.*]] = tensor.empty() : tensor<16x32x4xf32>
	// CHECK: %[[F:.*]] = linalg.fill ins(%[[ID]] : f32) outs(%[[INI]] : tensor<16x32x4xf32>) -> tensor<16x32x4xf32>			// CHECK: %[[F:.*]] = linalg.fill ins(%[[ID]] : f32) outs(%[[INI]] : tensor<16x32x4xf32>)
	// CHECK: %[[G:.*]] = linalg.generic {indexing_maps = [#[[$MAP0]], #[[$MAP1]], #[[$MAP2]]]			// CHECK: %[[G:.*]] = linalg.generic {indexing_maps = [#[[$MAP0]], #[[$MAP1]], #[[$MAP2]]]
	// CHECK-SAME: , iterator_types = ["parallel", "parallel", "reduction", "parallel"]}			// CHECK-SAME: , iterator_types = ["parallel", "parallel", "reduction", "parallel"]}
	// CHECK-SAME: ins(%[[I1]], %[[I2]] : tensor<16x64x4xf32>, tensor<64x4x32xf32>) outs(%[[F]] : tensor<16x32x4xf32>) {			// CHECK-SAME: ins(%[[I1]], %[[I2]] : tensor<16x64x4xf32>, tensor<64x4x32xf32>) outs(%[[F]] : tensor<16x32x4xf32>) {
	// CHECK: arith.mulf			// CHECK: arith.mulf
	// CHECK: arith.addf			// CHECK: arith.addf
	// CHECK: linalg.yield			// CHECK: linalg.yield
	// CHECK: } -> tensor<16x32x4xf32>			// CHECK: } -> tensor<16x32x4xf32>
	// CHECK: %[[R:.*]] = linalg.generic {indexing_maps = [#[[$MAP3]], #[[$MAP4]]],			// CHECK: %[[R:.*]] = linalg.generic {indexing_maps = [#[[$MAP3]], #[[$MAP4]]],
	Show All 31 Lines
	// CHECK-DAG: #[[$MAP1:.*]] = affine_map<(d0, d1) -> ()>			// CHECK-DAG: #[[$MAP1:.*]] = affine_map<(d0, d1) -> ()>
	// CHECK-DAG: #[[$MAP2:.*]] = affine_map<(d0, d1) -> (d1)>			// CHECK-DAG: #[[$MAP2:.*]] = affine_map<(d0, d1) -> (d1)>
	// CHECK-DAG: #[[$MAP3:.*]] = affine_map<(d0) -> (d0)>			// CHECK-DAG: #[[$MAP3:.*]] = affine_map<(d0) -> (d0)>
	// CHECK-DAG: #[[$MAP4:.*]] = affine_map<(d0) -> ()>			// CHECK-DAG: #[[$MAP4:.*]] = affine_map<(d0) -> ()>
	//CHECK-LABEL: @generic_split_1d			//CHECK-LABEL: @generic_split_1d
	// CHECK-DAG: %[[ID:.*]] = arith.constant 1.000000e+00 : f32			// CHECK-DAG: %[[ID:.*]] = arith.constant 1.000000e+00 : f32
	// CHECK-DAG: %[[I1:.]] = tensor.expand_shape %{{.}}[0, 1]] : tensor<32xf32> into tensor<8x4xf32>			// CHECK-DAG: %[[I1:.]] = tensor.expand_shape %{{.}}[0, 1]] : tensor<32xf32> into tensor<8x4xf32>
	// CHECK-DAG: %[[INI:.*]] = tensor.empty() : tensor<4xf32>			// CHECK-DAG: %[[INI:.*]] = tensor.empty() : tensor<4xf32>
	// CHECK: %[[F:.*]] = linalg.fill ins(%[[ID]] : f32) outs(%[[INI]] : tensor<4xf32>) -> tensor<4xf32>			// CHECK: %[[F:.*]] = linalg.fill ins(%[[ID]] : f32) outs(%[[INI]] : tensor<4xf32>)
	// CHECK: %[[G:.*]] = linalg.generic			// CHECK: %[[G:.*]] = linalg.generic
	// CHECK: {indexing_maps = [#[[$MAP0]], #[[$MAP1]], #[[$MAP2]]],			// CHECK: {indexing_maps = [#[[$MAP0]], #[[$MAP1]], #[[$MAP2]]],
	// CHECK: iterator_types = ["reduction", "parallel"]} ins(%[[I1]], %{{.*}} : tensor<8x4xf32>, tensor<f32>) outs(%[[F]] : tensor<4xf32>) {			// CHECK: iterator_types = ["reduction", "parallel"]} ins(%[[I1]], %{{.*}} : tensor<8x4xf32>, tensor<f32>) outs(%[[F]] : tensor<4xf32>) {
	// CHECK: arith.subf			// CHECK: arith.subf
	// CHECK: math.exp			// CHECK: math.exp
	// CHECK: arith.mulf			// CHECK: arith.mulf
	// CHECK: linalg.yield			// CHECK: linalg.yield
	// CHECK: } -> tensor<4xf32>			// CHECK: } -> tensor<4xf32>
	Show All 35 Lines
	// CHECK-DAG: #[[$MAP2:.*]] = affine_map<(d0, d1, d2, d3) -> (d3, d0, d2)>			// CHECK-DAG: #[[$MAP2:.*]] = affine_map<(d0, d1, d2, d3) -> (d3, d0, d2)>
	// CHECK-DAG: #[[$MAP3:.*]] = affine_map<(d0, d1, d2) -> (d0, d1, d2)>			// CHECK-DAG: #[[$MAP3:.*]] = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
	// CHECK-DAG: #[[$MAP4:.*]] = affine_map<(d0, d1, d2) -> (d0, d1)>			// CHECK-DAG: #[[$MAP4:.*]] = affine_map<(d0, d1, d2) -> (d0, d1)>
	// CHECK-LABEL: func @generic_split_3d			// CHECK-LABEL: func @generic_split_3d
	// CHECK-DAG: %[[ID:.*]] = arith.constant 0x7F800000 : f32			// CHECK-DAG: %[[ID:.*]] = arith.constant 0x7F800000 : f32
	// CHECK-DAG: %[[I1:.]] = tensor.expand_shape %{{.}}[0, 1], [2]] : tensor<32x2xf32> into tensor<8x4x2xf32>			// CHECK-DAG: %[[I1:.]] = tensor.expand_shape %{{.}}[0, 1], [2]] : tensor<32x2xf32> into tensor<8x4x2xf32>
	// CHECK-DAG: %[[I2:.]] = tensor.expand_shape %{{.}}[0], [1, 2]] : tensor<5x32xf32> into tensor<5x8x4xf32>			// CHECK-DAG: %[[I2:.]] = tensor.expand_shape %{{.}}[0], [1, 2]] : tensor<5x32xf32> into tensor<5x8x4xf32>
	// CHECK-DAG: %[[INI:.*]] = tensor.empty() : tensor<5x2x4xf32>			// CHECK-DAG: %[[INI:.*]] = tensor.empty() : tensor<5x2x4xf32>
	// CHECK: %[[F:.*]] = linalg.fill ins(%[[ID]] : f32) outs(%[[INI]] : tensor<5x2x4xf32>) -> tensor<5x2x4xf32>			// CHECK: %[[F:.*]] = linalg.fill ins(%[[ID]] : f32) outs(%[[INI]] : tensor<5x2x4xf32>)
	// CHECK: %[[G:.*]] = linalg.generic {indexing_maps = [#[[$MAP0]], #[[$MAP1]], #[[$MAP2]]], iterator_types = ["parallel", "reduction", "parallel", "parallel"]}			// CHECK: %[[G:.*]] = linalg.generic {indexing_maps = [#[[$MAP0]], #[[$MAP1]], #[[$MAP2]]], iterator_types = ["parallel", "reduction", "parallel", "parallel"]}
	// CHECK-SAME: ins(%[[I1]], %[[I2]] : tensor<8x4x2xf32>, tensor<5x8x4xf32>) outs(%[[F]] : tensor<5x2x4xf32>) {			// CHECK-SAME: ins(%[[I1]], %[[I2]] : tensor<8x4x2xf32>, tensor<5x8x4xf32>) outs(%[[F]] : tensor<5x2x4xf32>) {
	// CHECK: arith.addf			// CHECK: arith.addf
	// CHECK: arith.minf			// CHECK: arith.minf
	// CHECK: linalg.yield			// CHECK: linalg.yield
	// CHECK: } -> tensor<5x2x4xf32>			// CHECK: } -> tensor<5x2x4xf32>
	// CHECK: %[[R:.*]] = linalg.generic {indexing_maps = [#[[$MAP3]], #[[$MAP4]]], iterator_types = ["parallel", "parallel", "reduction"]}			// CHECK: %[[R:.*]] = linalg.generic {indexing_maps = [#[[$MAP3]], #[[$MAP4]]], iterator_types = ["parallel", "parallel", "reduction"]}
	// CHECK-SAME: ins(%[[G]] : tensor<5x2x4xf32>) outs(%{{.*}} : tensor<5x2xf32>) {			// CHECK-SAME: ins(%[[G]] : tensor<5x2x4xf32>) outs(%{{.*}} : tensor<5x2xf32>) {
	Show All 10 Lines

mlir/test/Dialect/Linalg/transform-op-tile.mlir

Show All 15 Lines	%arg0: tensor<128x128xf32>, %arg1: tensor<128x128xf32>, %arg2: tensor<128x128xf32>)
-> tensor<128x128xf32> {		-> tensor<128x128xf32> {
// CHECK: %[[TD0:.]] = scf.for {{.}} to {{.}} step {{.}} iter_args(%[[TC0:.*]] = %[[TC]]) -> (tensor<128x128xf32>) {		// CHECK: %[[TD0:.]] = scf.for {{.}} to {{.}} step {{.}} iter_args(%[[TC0:.*]] = %[[TC]]) -> (tensor<128x128xf32>) {
// CHECK: %[[TD1:.]] = scf.for {{.}} to {{.}} step {{.}} iter_args(%[[TC1:.*]] = %[[TC0]]) -> (tensor<128x128xf32>) {		// CHECK: %[[TD1:.]] = scf.for {{.}} to {{.}} step {{.}} iter_args(%[[TC1:.*]] = %[[TC0]]) -> (tensor<128x128xf32>) {
// CHECK: %[[TD2:.]] = scf.for {{.}} to {{.}} step {{.}} iter_args(%[[TC2:.*]] = %[[TC1]]) -> (tensor<128x128xf32>) {		// CHECK: %[[TD2:.]] = scf.for {{.}} to {{.}} step {{.}} iter_args(%[[TC2:.*]] = %[[TC1]]) -> (tensor<128x128xf32>) {
// CHECK: %[[sTA:.]] = tensor.extract_slice %[[TA]][{{.}}] : tensor<128x128xf32> to tensor<4x4xf32>		// CHECK: %[[sTA:.]] = tensor.extract_slice %[[TA]][{{.}}] : tensor<128x128xf32> to tensor<4x4xf32>
// CHECK: %[[sTB:.]] = tensor.extract_slice %[[TB]][{{.}}] : tensor<128x128xf32> to tensor<4x4xf32>		// CHECK: %[[sTB:.]] = tensor.extract_slice %[[TB]][{{.}}] : tensor<128x128xf32> to tensor<4x4xf32>
// CHECK: %[[sTC:.]] = tensor.extract_slice %[[TC2]][{{.}}] : tensor<128x128xf32> to tensor<4x4xf32>		// CHECK: %[[sTC:.]] = tensor.extract_slice %[[TC2]][{{.}}] : tensor<128x128xf32> to tensor<4x4xf32>
// CHECK: %[[sTD:.*]] = linalg.matmul ins(%[[sTA]], %[[sTB]] : tensor<4x4xf32>, tensor<4x4xf32>)		// CHECK: %[[sTD:.*]] = linalg.matmul ins(%[[sTA]], %[[sTB]] : tensor<4x4xf32>, tensor<4x4xf32>)
// CHECK-SAME: outs(%[[sTC]] : tensor<4x4xf32>) -> tensor<4x4xf32>		// CHECK-SAME: outs(%[[sTC]] : tensor<4x4xf32>)
// CHECK: %[[TD:.]] = tensor.insert_slice %[[sTD]] into %[[TC2]][{{.}}] : tensor<4x4xf32> into tensor<128x128xf32>		// CHECK: %[[TD:.]] = tensor.insert_slice %[[sTD]] into %[[TC2]][{{.}}] : tensor<4x4xf32> into tensor<128x128xf32>
// CHECK: scf.yield %[[TD]] : tensor<128x128xf32>		// CHECK: scf.yield %[[TD]] : tensor<128x128xf32>
// CHECK: scf.yield %[[TD2]] : tensor<128x128xf32>		// CHECK: scf.yield %[[TD2]] : tensor<128x128xf32>
// CHECK: scf.yield %[[TD1]] : tensor<128x128xf32>		// CHECK: scf.yield %[[TD1]] : tensor<128x128xf32>
%0 = linalg.matmul ins(%arg0, %arg1: tensor<128x128xf32>, tensor<128x128xf32>)		%0 = linalg.matmul ins(%arg0, %arg1: tensor<128x128xf32>, tensor<128x128xf32>)
outs(%arg2: tensor<128x128xf32>)		outs(%arg2: tensor<128x128xf32>)
-> tensor<128x128xf32>

// CHECK: return %[[TD0]] : tensor<128x128xf32>		// CHECK: return %[[TD0]] : tensor<128x128xf32>
return %0 : tensor<128x128xf32>		return %0 : tensor<128x128xf32>
}		}

// -----		// -----

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
Show All 15 Lines	%arg0: tensor<128x128xf32>, %arg1: tensor<128x128xf32>, %arg2: tensor<128x128xf32>)
-> tensor<128x128xf32> {		-> tensor<128x128xf32> {
// CHECK: %[[TD0:.]] = scf.for {{.}} to {{.}} step {{.}} iter_args(%[[TC0:.*]] = %[[TC]]) -> (tensor<128x128xf32>) {		// CHECK: %[[TD0:.]] = scf.for {{.}} to {{.}} step {{.}} iter_args(%[[TC0:.*]] = %[[TC]]) -> (tensor<128x128xf32>) {
// CHECK: %[[TD1:.]] = scf.for {{.}} to {{.}} step {{.}} iter_args(%[[TC1:.*]] = %[[TC0]]) -> (tensor<128x128xf32>) {		// CHECK: %[[TD1:.]] = scf.for {{.}} to {{.}} step {{.}} iter_args(%[[TC1:.*]] = %[[TC0]]) -> (tensor<128x128xf32>) {
// CHECK: %[[TD2:.]] = scf.for {{.}} to {{.}} step {{.}} iter_args(%[[TC2:.*]] = %[[TC1]]) -> (tensor<128x128xf32>) {		// CHECK: %[[TD2:.]] = scf.for {{.}} to {{.}} step {{.}} iter_args(%[[TC2:.*]] = %[[TC1]]) -> (tensor<128x128xf32>) {
// CHECK: %[[sTA:.]] = tensor.extract_slice %[[TA]][{{.}}] : tensor<128x128xf32> to tensor<?x4xf32>		// CHECK: %[[sTA:.]] = tensor.extract_slice %[[TA]][{{.}}] : tensor<128x128xf32> to tensor<?x4xf32>
// CHECK: %[[sTB:.]] = tensor.extract_slice %[[TB]][{{.}}] : tensor<128x128xf32> to tensor<4x?xf32>		// CHECK: %[[sTB:.]] = tensor.extract_slice %[[TB]][{{.}}] : tensor<128x128xf32> to tensor<4x?xf32>
// CHECK: %[[sTC:.]] = tensor.extract_slice %[[TC2]][{{.}}] : tensor<128x128xf32> to tensor<?x?xf32>		// CHECK: %[[sTC:.]] = tensor.extract_slice %[[TC2]][{{.}}] : tensor<128x128xf32> to tensor<?x?xf32>
// CHECK: %[[sTD:.*]] = linalg.matmul ins(%[[sTA]], %[[sTB]] : tensor<?x4xf32>, tensor<4x?xf32>)		// CHECK: %[[sTD:.*]] = linalg.matmul ins(%[[sTA]], %[[sTB]] : tensor<?x4xf32>, tensor<4x?xf32>)
// CHECK-SAME: outs(%[[sTC]] : tensor<?x?xf32>) -> tensor<?x?xf32>		// CHECK-SAME: outs(%[[sTC]] : tensor<?x?xf32>)
// CHECK: %[[TD:.]] = tensor.insert_slice %[[sTD]] into %[[TC2]][{{.}}] : tensor<?x?xf32> into tensor<128x128xf32>		// CHECK: %[[TD:.]] = tensor.insert_slice %[[sTD]] into %[[TC2]][{{.}}] : tensor<?x?xf32> into tensor<128x128xf32>
// CHECK: scf.yield %[[TD]] : tensor<128x128xf32>		// CHECK: scf.yield %[[TD]] : tensor<128x128xf32>
// CHECK: scf.yield %[[TD2]] : tensor<128x128xf32>		// CHECK: scf.yield %[[TD2]] : tensor<128x128xf32>
// CHECK: scf.yield %[[TD1]] : tensor<128x128xf32>		// CHECK: scf.yield %[[TD1]] : tensor<128x128xf32>
%sz = func.call @get_dynamic_tile_size() : () -> index		%sz = func.call @get_dynamic_tile_size() : () -> index
%0 = linalg.matmul ins(%arg0, %arg1: tensor<128x128xf32>, tensor<128x128xf32>)		%0 = linalg.matmul ins(%arg0, %arg1: tensor<128x128xf32>, tensor<128x128xf32>)
outs(%arg2: tensor<128x128xf32>)		outs(%arg2: tensor<128x128xf32>)
-> tensor<128x128xf32>

// CHECK: return %[[TD0]] : tensor<128x128xf32>		// CHECK: return %[[TD0]] : tensor<128x128xf32>
return %0 : tensor<128x128xf32>		return %0 : tensor<128x128xf32>
}		}

mlir/test/Dialect/Linalg/transform-op-vectorize.mlir

// RUN: mlir-opt %s -test-transform-dialect-interpreter -split-input-file -verify-diagnostics \| FileCheck %s		// RUN: mlir-opt %s -test-transform-dialect-interpreter -split-input-file -verify-diagnostics \| FileCheck %s

// CHECK-LABEL: @vectorize_matmul		// CHECK-LABEL: @vectorize_matmul
// CHECK-SAME: %[[A:.*]]: tensor<24x12xf32>		// CHECK-SAME: %[[A:.*]]: tensor<24x12xf32>
// CHECK-SAME: %[[B:.*]]: tensor<12x25xf32>		// CHECK-SAME: %[[B:.*]]: tensor<12x25xf32>
// CHECK-SAME: %[[C:.*]]: tensor<24x25xf32>		// CHECK-SAME: %[[C:.*]]: tensor<24x25xf32>
func.func @vectorize_matmul(%arg0: tensor<24x12xf32>,		func.func @vectorize_matmul(%arg0: tensor<24x12xf32>,
%arg1: tensor<12x25xf32>,		%arg1: tensor<12x25xf32>,
%arg2: tensor<24x25xf32>) -> tensor<24x25xf32> {		%arg2: tensor<24x25xf32>) -> tensor<24x25xf32> {
// CHECK: %[[vA:.+]] = vector.transfer_read %[[A]]		// CHECK: %[[vA:.+]] = vector.transfer_read %[[A]]
// CHECK: %[[vB:.+]] = vector.transfer_read %[[B]]		// CHECK: %[[vB:.+]] = vector.transfer_read %[[B]]
// CHECK: %[[vC:.+]] = vector.transfer_read %[[C]]		// CHECK: %[[vC:.+]] = vector.transfer_read %[[C]]
// CHECK: %[[vR:.+]] = vector.contract {{.*}} %[[vA]], %[[vB]], %[[vC]]		// CHECK: %[[vR:.+]] = vector.contract {{.*}} %[[vA]], %[[vB]], %[[vC]]
// CHECK: vector.transfer_write %[[vR]], %[[C]]		// CHECK: vector.transfer_write %[[vR]], %[[C]]
%0 = linalg.matmul ins(%arg0, %arg1 : tensor<24x12xf32>, tensor<12x25xf32>) outs(%arg2 : tensor<24x25xf32>) -> tensor<24x25xf32>		%0 = linalg.matmul ins(%arg0, %arg1 : tensor<24x12xf32>, tensor<12x25xf32>) outs(%arg2 : tensor<24x25xf32>)
func.return %0 : tensor<24x25xf32>		func.return %0 : tensor<24x25xf32>
}		}

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb1(%arg1: !pdl.operation):		^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1		%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1
%1 = get_closest_isolated_parent %0 : (!pdl.operation) -> !pdl.operation		%1 = get_closest_isolated_parent %0 : (!pdl.operation) -> !pdl.operation
%2 = transform.structured.vectorize %1		%2 = transform.structured.vectorize %1
Show All 28 Lines	func.func @vectorize_keep_pad(
^bb0(%arg6: index, %arg7: index):		^bb0(%arg6: index, %arg7: index):
tensor.yield %cst : f32		tensor.yield %cst : f32
} : tensor<?x5xf32> to tensor<7x5xf32>		} : tensor<?x5xf32> to tensor<7x5xf32>
// CHECK: %[[vA:.+]] = vector.transfer_read %[[pA]]		// CHECK: %[[vA:.+]] = vector.transfer_read %[[pA]]
// CHECK: %[[vB:.+]] = vector.transfer_read %[[pB]]		// CHECK: %[[vB:.+]] = vector.transfer_read %[[pB]]
// CHECK: %[[vC:.+]] = vector.transfer_read %[[C]]		// CHECK: %[[vC:.+]] = vector.transfer_read %[[C]]
// CHECK: %[[vR:.+]] = vector.contract {{.*}} %[[vA]], %[[vB]], %[[vC]]		// CHECK: %[[vR:.+]] = vector.contract {{.*}} %[[vA]], %[[vB]], %[[vC]]
// CHECK: vector.transfer_write %[[vR]], %[[C]]		// CHECK: vector.transfer_write %[[vR]], %[[C]]
%8 = linalg.matmul ins(%5, %7 : tensor<4x7xf32>, tensor<7x5xf32>) outs(%3 : tensor<4x5xf32>) -> tensor<4x5xf32>		%8 = linalg.matmul ins(%5, %7 : tensor<4x7xf32>, tensor<7x5xf32>) outs(%3 : tensor<4x5xf32>)
%9 = tensor.insert_slice %8 into %arg2[%arg3, %arg4] [4, 5] [1, 1] : tensor<4x5xf32> into tensor<24x25xf32>		%9 = tensor.insert_slice %8 into %arg2[%arg3, %arg4] [4, 5] [1, 1] : tensor<4x5xf32> into tensor<24x25xf32>
return %9 : tensor<24x25xf32>		return %9 : tensor<24x25xf32>
}		}

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb1(%arg1: !pdl.operation):		^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1		%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1
%1 = get_closest_isolated_parent %0 : (!pdl.operation) -> !pdl.operation		%1 = get_closest_isolated_parent %0 : (!pdl.operation) -> !pdl.operation
Show All 31 Lines	func.func @vectorize_pad(
// CHECK: %[[vB:.+]] = vector.transfer_read %[[sB]]		// CHECK: %[[vB:.+]] = vector.transfer_read %[[sB]]
%7 = tensor.pad %2 nofold low[%c0, %c0] high[%6, %c0] {		%7 = tensor.pad %2 nofold low[%c0, %c0] high[%6, %c0] {
^bb0(%arg6: index, %arg7: index):		^bb0(%arg6: index, %arg7: index):
tensor.yield %cst : f32		tensor.yield %cst : f32
} : tensor<?x5xf32> to tensor<7x5xf32>		} : tensor<?x5xf32> to tensor<7x5xf32>
// CHECK: %[[vC:.+]] = vector.transfer_read %[[C]]		// CHECK: %[[vC:.+]] = vector.transfer_read %[[C]]
// CHECK: %[[vR:.+]] = vector.contract {{.*}} %[[vA]], %[[vB]], %[[vC]]		// CHECK: %[[vR:.+]] = vector.contract {{.*}} %[[vA]], %[[vB]], %[[vC]]
// CHECK: vector.transfer_write %[[vR]], %[[C]]		// CHECK: vector.transfer_write %[[vR]], %[[C]]
%8 = linalg.matmul ins(%5, %7 : tensor<4x7xf32>, tensor<7x5xf32>) outs(%3 : tensor<4x5xf32>) -> tensor<4x5xf32>		%8 = linalg.matmul ins(%5, %7 : tensor<4x7xf32>, tensor<7x5xf32>) outs(%3 : tensor<4x5xf32>)
%9 = tensor.insert_slice %8 into %arg2[%arg3, %arg4] [4, 5] [1, 1] : tensor<4x5xf32> into tensor<24x25xf32>		%9 = tensor.insert_slice %8 into %arg2[%arg3, %arg4] [4, 5] [1, 1] : tensor<4x5xf32> into tensor<24x25xf32>
return %9 : tensor<24x25xf32>		return %9 : tensor<24x25xf32>
}		}

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb1(%arg1: !pdl.operation):		^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1		%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1
%1 = get_closest_isolated_parent %0 : (!pdl.operation) -> !pdl.operation		%1 = get_closest_isolated_parent %0 : (!pdl.operation) -> !pdl.operation
%2 = transform.structured.vectorize %1 {vectorize_padding}		%2 = transform.structured.vectorize %1 {vectorize_padding}
}		}

// -----		// -----

func.func @vectorize(%arg0: tensor<24x12xf32>,		func.func @vectorize(%arg0: tensor<24x12xf32>,
%arg1: tensor<12x25xf32>,		%arg1: tensor<12x25xf32>,
%arg2: tensor<24x25xf32>) -> tensor<24x25xf32> {		%arg2: tensor<24x25xf32>) -> tensor<24x25xf32> {
// expected-note @below {{non-isolated target}}		// expected-note @below {{non-isolated target}}
%0 = linalg.matmul ins(%arg0, %arg1 : tensor<24x12xf32>, tensor<12x25xf32>) outs(%arg2 : tensor<24x25xf32>) -> tensor<24x25xf32>		%0 = linalg.matmul ins(%arg0, %arg1 : tensor<24x12xf32>, tensor<12x25xf32>) outs(%arg2 : tensor<24x25xf32>)
func.return %0 : tensor<24x25xf32>		func.return %0 : tensor<24x25xf32>
}		}

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb1(%arg1: !pdl.operation):		^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1		%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1
// expected-error @below {{op requires isolated-from-above targets}}		// expected-error @below {{op requires isolated-from-above targets}}
%2 = transform.structured.vectorize %0		%2 = transform.structured.vectorize %0
}		}

mlir/test/Dialect/Linalg/transform-tile-and-fuse.mlir

Show All 11 Lines	module {
func.func @foo(%A: tensor<?x?xf32>, %B: tensor<?x?xf32>, %C: tensor<?xf32>,		func.func @foo(%A: tensor<?x?xf32>, %B: tensor<?x?xf32>, %C: tensor<?xf32>,
%D: tensor<?x?xf32>, %sz0: index, %sz1: index)		%D: tensor<?x?xf32>, %sz0: index, %sz1: index)
-> tensor<?x?xf32>		-> tensor<?x?xf32>
{		{
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%5 = linalg.fill		%5 = linalg.fill
{__producer__}		{__producer__}
ins(%cst : f32)		ins(%cst : f32)
outs(%D : tensor<?x?xf32>) -> tensor<?x?xf32>		outs(%D : tensor<?x?xf32>)
%6 = linalg.matmul		%6 = linalg.matmul
{__producer__}		{__producer__}
ins(%A, %B : tensor<?x?xf32>, tensor<?x?xf32>)		ins(%A, %B : tensor<?x?xf32>, tensor<?x?xf32>)
outs(%5 : tensor<?x?xf32>) -> tensor<?x?xf32>		outs(%5 : tensor<?x?xf32>)
%7 = linalg.generic		%7 = linalg.generic
{__root__,		{__root__,
indexing_maps = [affine_map<(d0, d1) -> (d0)>,		indexing_maps = [affine_map<(d0, d1) -> (d0)>,
affine_map<(d0, d1) -> (d0, d1)>,		affine_map<(d0, d1) -> (d0, d1)>,
affine_map<(d0, d1) -> (d0, d1)>],		affine_map<(d0, d1) -> (d0, d1)>],
iterator_types = ["parallel", "parallel"]		iterator_types = ["parallel", "parallel"]
}		}
ins(%C, %6 : tensor<?xf32>, tensor<?x?xf32>)		ins(%C, %6 : tensor<?xf32>, tensor<?x?xf32>)
Show All 36 Lines	module {
func.func @foo(%A: tensor<?x?xf32>, %B: tensor<?x?xf32>, %C: tensor<?xf32>,		func.func @foo(%A: tensor<?x?xf32>, %B: tensor<?x?xf32>, %C: tensor<?xf32>,
%D: tensor<?x?xf32>, %sz0: index, %sz1: index)		%D: tensor<?x?xf32>, %sz0: index, %sz1: index)
-> tensor<?x?xf32>		-> tensor<?x?xf32>
{		{
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%5 = linalg.fill		%5 = linalg.fill
{__producer__}		{__producer__}
ins(%cst : f32)		ins(%cst : f32)
outs(%D : tensor<?x?xf32>) -> tensor<?x?xf32>		outs(%D : tensor<?x?xf32>)
%6 = linalg.matmul		%6 = linalg.matmul
{__producer__}		{__producer__}
ins(%A, %B : tensor<?x?xf32>, tensor<?x?xf32>)		ins(%A, %B : tensor<?x?xf32>, tensor<?x?xf32>)
outs(%5 : tensor<?x?xf32>) -> tensor<?x?xf32>		outs(%5 : tensor<?x?xf32>)
%7 = linalg.generic		%7 = linalg.generic
{__root__,		{__root__,
indexing_maps = [affine_map<(d0, d1) -> (d0)>,		indexing_maps = [affine_map<(d0, d1) -> (d0)>,
affine_map<(d0, d1) -> (d0, d1)>,		affine_map<(d0, d1) -> (d0, d1)>,
affine_map<(d0, d1) -> (d0, d1)>],		affine_map<(d0, d1) -> (d0, d1)>],
iterator_types = ["parallel", "parallel"]		iterator_types = ["parallel", "parallel"]
}		}
ins(%C, %6 : tensor<?xf32>, tensor<?x?xf32>)		ins(%C, %6 : tensor<?xf32>, tensor<?x?xf32>)
Show All 24 Lines

mlir/test/Dialect/Linalg/transform-tile-reduction.mlir

	Show All 27 Lines
	// CHECK-DAG: %[[I:.*]] = arith.constant 0.000000e+00 : f32			// CHECK-DAG: %[[I:.*]] = arith.constant 0.000000e+00 : f32
	// CHECK-DAG: %[[C5:.*]] = arith.constant 5 : index			// CHECK-DAG: %[[C5:.*]] = arith.constant 5 : index
	// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index			// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
	// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index			// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
	// CHECK-DAG: %[[D0:.*]] = tensor.dim %[[ARG0]], %[[C0]] : tensor<?x?xf32>			// CHECK-DAG: %[[D0:.*]] = tensor.dim %[[ARG0]], %[[C0]] : tensor<?x?xf32>
	// CHECK-DAG: %[[D1:.*]] = tensor.dim %[[ARG0]], %[[C1]] : tensor<?x?xf32>			// CHECK-DAG: %[[D1:.*]] = tensor.dim %[[ARG0]], %[[C1]] : tensor<?x?xf32>
	// CHECK-DAG: %[[D2:.*]] = tensor.dim %[[ARG1]], %[[C0]] : tensor<?xf32>			// CHECK-DAG: %[[D2:.*]] = tensor.dim %[[ARG1]], %[[C0]] : tensor<?xf32>
	// CHECK: %[[E:.*]] = tensor.empty(%[[D2]]) : tensor<?x5xf32>			// CHECK: %[[E:.*]] = tensor.empty(%[[D2]]) : tensor<?x5xf32>
	// CHECK: %[[F:.*]] = linalg.fill ins(%[[I]] : f32) outs(%[[E]] : tensor<?x5xf32>) -> tensor<?x5xf32>			// CHECK: %[[F:.*]] = linalg.fill ins(%[[I]] : f32) outs(%[[E]] : tensor<?x5xf32>)
	// CHECK: %[[L:.]] = scf.for %[[K:.]] = %[[C0]] to %[[D1]] step %[[C5]] iter_args(%[[ARG3:.*]] = %[[F]]) -> (tensor<?x5xf32>) {			// CHECK: %[[L:.]] = scf.for %[[K:.]] = %[[C0]] to %[[D1]] step %[[C5]] iter_args(%[[ARG3:.*]] = %[[F]]) -> (tensor<?x5xf32>) {
	// CHECK: %[[PS:.*]] = affine.min #[[MAP2]](%[[K]])[%[[D1]]]			// CHECK: %[[PS:.*]] = affine.min #[[MAP2]](%[[K]])[%[[D1]]]
	// CHECK: %[[EXT2:.]] = tensor.extract_slice %[[ARG0]][0, %[[K:.]]] [%[[D0]], %[[PS]]] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>			// CHECK: %[[EXT2:.]] = tensor.extract_slice %[[ARG0]][0, %[[K:.]]] [%[[D0]], %[[PS]]] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
	// CHECK: %[[EXT:.*]] = tensor.extract_slice %[[ARG3]][0, 0] [%[[D0]], %[[PS]]] [1, 1] : tensor<?x5xf32> to tensor<?x?xf32>			// CHECK: %[[EXT:.*]] = tensor.extract_slice %[[ARG3]][0, 0] [%[[D0]], %[[PS]]] [1, 1] : tensor<?x5xf32> to tensor<?x?xf32>
	// CHECK: %[[PR:.*]] = linalg.generic {indexing_maps = [#[[MAP0]], #[[MAP0]]], iterator_types = ["parallel", "parallel"]} ins(%[[EXT2]] : tensor<?x?xf32>) outs(%[[EXT]] : tensor<?x?xf32>) {			// CHECK: %[[PR:.*]] = linalg.generic {indexing_maps = [#[[MAP0]], #[[MAP0]]], iterator_types = ["parallel", "parallel"]} ins(%[[EXT2]] : tensor<?x?xf32>) outs(%[[EXT]] : tensor<?x?xf32>) {
	// CHECK: arith.mulf			// CHECK: arith.mulf
	// CHECK: arith.addf			// CHECK: arith.addf
	// CHECK: linalg.yield			// CHECK: linalg.yield
	Show All 28 Lines
	^bb0(%arg1: !pdl.operation):			^bb0(%arg1: !pdl.operation):
	%0 = transform.structured.match ops{["linalg.generic"]} in %arg1			%0 = transform.structured.match ops{["linalg.generic"]} in %arg1
	%loop, %1, %2, %3 = transform.structured.tile_reduction_using_scf %0			%loop, %1, %2, %3 = transform.structured.tile_reduction_using_scf %0
	by tile_sizes = [5, 0]			by tile_sizes = [5, 0]
	}			}

	// CHECK: func @reduction_tile_transpose			// CHECK: func @reduction_tile_transpose
	// CHECK: tensor.empty(%{{.*}}) : tensor<5x?xf32>			// CHECK: tensor.empty(%{{.*}}) : tensor<5x?xf32>
	// CHECK: linalg.fill {{.*}} : tensor<5x?xf32>) -> tensor<5x?xf32>			// CHECK: linalg.fill {{.*}} : tensor<5x?xf32>)
	// CHECK: scf.for			// CHECK: scf.for
	// CHECK: linalg.generic			// CHECK: linalg.generic
	// CHECK: %[[D3:.]] = tensor.dim %{{.}}, %[[C0]] : tensor<?x?xf32>			// CHECK: %[[D3:.]] = tensor.dim %{{.}}, %[[C0]] : tensor<?x?xf32>
	// CHECK: %[[D4:.]] = tensor.dim %{{.}}, %[[C1]] : tensor<?x?xf32>			// CHECK: %[[D4:.]] = tensor.dim %{{.}}, %[[C1]] : tensor<?x?xf32>
	// CHECK: %[[INS:.*]] = tensor.insert_slice %[[PR]] into %[[ARG3]][0, 0] [%[[D3]], %[[D4]]] [1, 1] : tensor<?x?xf32> into tensor<5x?xf32>			// CHECK: %[[INS:.*]] = tensor.insert_slice %[[PR]] into %[[ARG3]][0, 0] [%[[D3]], %[[D4]]] [1, 1] : tensor<?x?xf32> into tensor<5x?xf32>
	// CHECK: scf.yield {{.*}} : tensor<5x?xf32>			// CHECK: scf.yield {{.*}} : tensor<5x?xf32>
	// CHECK: }			// CHECK: }
	// CHECK: linalg.generic			// CHECK: linalg.generic
	Show All 32 Lines
	// CHECK-DAG: %[[I:.*]] = arith.constant 0.000000e+00 : f32			// CHECK-DAG: %[[I:.*]] = arith.constant 0.000000e+00 : f32
	// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index			// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
	// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index			// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
	// CHECK-DAG: %[[C5:.*]] = arith.constant 5 : index			// CHECK-DAG: %[[C5:.*]] = arith.constant 5 : index
	// CHECK-DAG: %[[D0:.*]] = tensor.dim %[[ARG0]], %[[C0]] : tensor<?x?xf32>			// CHECK-DAG: %[[D0:.*]] = tensor.dim %[[ARG0]], %[[C0]] : tensor<?x?xf32>
	// CHECK-DAG: %[[D1:.*]] = tensor.dim %[[ARG0]], %[[C1]] : tensor<?x?xf32>			// CHECK-DAG: %[[D1:.*]] = tensor.dim %[[ARG0]], %[[C1]] : tensor<?x?xf32>
	// CHECK-DAG: %[[D2:.*]] = tensor.dim %[[ARG1]], %[[C0]] : tensor<?xf32>			// CHECK-DAG: %[[D2:.*]] = tensor.dim %[[ARG1]], %[[C0]] : tensor<?xf32>
	// CHECK: %[[E:.*]] = tensor.empty(%[[D2]]) : tensor<?x5xf32>			// CHECK: %[[E:.*]] = tensor.empty(%[[D2]]) : tensor<?x5xf32>
	// CHECK: %[[F:.*]] = linalg.fill ins(%[[I]] : f32) outs(%[[E]] : tensor<?x5xf32>) -> tensor<?x5xf32>			// CHECK: %[[F:.*]] = linalg.fill ins(%[[I]] : f32) outs(%[[E]] : tensor<?x5xf32>)
	// CHECK: %[[L:.*]] = scf.foreach_thread (%[[IV:.+]]) in (%[[C5]]) shared_outs(%[[ARG3:.+]] = %[[F]]) -> (tensor<?x5xf32>) {			// CHECK: %[[L:.*]] = scf.foreach_thread (%[[IV:.+]]) in (%[[C5]]) shared_outs(%[[ARG3:.+]] = %[[F]]) -> (tensor<?x5xf32>) {
	// CHECK-DAG: %[[TS0:.+]] = affine.min #[[MAP0]](%[[IV]])[%[[D1]]]			// CHECK-DAG: %[[TS0:.+]] = affine.min #[[MAP0]](%[[IV]])[%[[D1]]]
	// CHECK-DAG: %[[TS1:.+]] = affine.max #[[MAP1]](%[[TS0]])			// CHECK-DAG: %[[TS1:.+]] = affine.max #[[MAP1]](%[[TS0]])
	// CHECK-DAG: %[[ET:.+]] = tensor.extract_slice %[[ARG3:.+]][0, %[[IV]]] [%[[D0]], 1] [1, 1] : tensor<?x5xf32> to tensor<?xf32>			// CHECK-DAG: %[[ET:.+]] = tensor.extract_slice %[[ARG3:.+]][0, %[[IV]]] [%[[D0]], 1] [1, 1] : tensor<?x5xf32> to tensor<?xf32>
	// CHECK: %[[TINDEX:.+]] = affine.apply #[[MAP2]](%[[IV]])[%[[D1]]]			// CHECK: %[[TINDEX:.+]] = affine.apply #[[MAP2]](%[[IV]])[%[[D1]]]
	// CHECK: %[[INCHUNK:.+]] = tensor.extract_slice %[[ARG0]][0, %[[TINDEX]]] [%[[D0]], %[[TS1]]] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>			// CHECK: %[[INCHUNK:.+]] = tensor.extract_slice %[[ARG0]][0, %[[TINDEX]]] [%[[D0]], %[[TS1]]] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
	// CHECK: %[[TEMPEXT:.+]] = tensor.extract_slice %[[ET]][0] [%[[D0]]] [1] : tensor<?xf32> to tensor<?xf32>			// CHECK: %[[TEMPEXT:.+]] = tensor.extract_slice %[[ET]][0] [%[[D0]]] [1] : tensor<?xf32> to tensor<?xf32>
	// CHECK: %[[PARTIAL:.+]] = linalg.generic {indexing_maps = [#[[MAP3]], #[[MAP4]]], iterator_types = ["parallel", "reduction"]} ins(%[[INCHUNK]] : tensor<?x?xf32>) outs(%[[TEMPEXT]] : tensor<?xf32>) {			// CHECK: %[[PARTIAL:.+]] = linalg.generic {indexing_maps = [#[[MAP3]], #[[MAP4]]], iterator_types = ["parallel", "reduction"]} ins(%[[INCHUNK]] : tensor<?x?xf32>) outs(%[[TEMPEXT]] : tensor<?xf32>) {
	Show All 11 Lines
	// CHECK: } -> tensor<?xf32>			// CHECK: } -> tensor<?xf32>
	// CHECK: return %[[R]] : tensor<?xf32>			// CHECK: return %[[R]] : tensor<?xf32>

	// -----			// -----

	func.func @matmul_tile_parallel(			func.func @matmul_tile_parallel(
	%A: tensor<?x?xf32>, %B: tensor<?x?xf32>, %out: tensor<?x?xf32>) -> tensor<?x?xf32> {			%A: tensor<?x?xf32>, %B: tensor<?x?xf32>, %out: tensor<?x?xf32>) -> tensor<?x?xf32> {
	%matmul = linalg.matmul ins(%A, %B: tensor<?x?xf32>, tensor<?x?xf32>)			%matmul = linalg.matmul ins(%A, %B: tensor<?x?xf32>, tensor<?x?xf32>)
	outs(%out: tensor<?x?xf32>) -> tensor<?x?xf32>			outs(%out: tensor<?x?xf32>)
	return %matmul : tensor<?x?xf32>			return %matmul : tensor<?x?xf32>
	}			}

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb0(%arg1: !pdl.operation):			^bb0(%arg1: !pdl.operation):
	%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1			%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1
	%loop, %1, %2, %3 = transform.structured.tile_reduction_using_foreach_thread %0			%loop, %1, %2, %3 = transform.structured.tile_reduction_using_foreach_thread %0
	by num_threads = [0, 0, 5], tile_sizes = []			by num_threads = [0, 0, 5], tile_sizes = []
	Show All 10 Lines
	// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index			// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
	// CHECK-DAG: %[[C5:.*]] = arith.constant 5 : index			// CHECK-DAG: %[[C5:.*]] = arith.constant 5 : index
	// CHECK-DAG: %[[D0:.*]] = tensor.dim %[[ARG0]], %[[C0]] : tensor<?x?xf32>			// CHECK-DAG: %[[D0:.*]] = tensor.dim %[[ARG0]], %[[C0]] : tensor<?x?xf32>
	// CHECK-DAG: %[[D1:.*]] = tensor.dim %[[ARG0]], %[[C1]] : tensor<?x?xf32>			// CHECK-DAG: %[[D1:.*]] = tensor.dim %[[ARG0]], %[[C1]] : tensor<?x?xf32>
	// CHECK-DAG: %[[D2:.*]] = tensor.dim %[[ARG1]], %[[C1]] : tensor<?x?xf32>			// CHECK-DAG: %[[D2:.*]] = tensor.dim %[[ARG1]], %[[C1]] : tensor<?x?xf32>
	// CHECK-DAG: %[[D3:.*]] = tensor.dim %[[ARG2]], %[[C0]] : tensor<?x?xf32>			// CHECK-DAG: %[[D3:.*]] = tensor.dim %[[ARG2]], %[[C0]] : tensor<?x?xf32>
	// CHECK-DAG: %[[D4:.*]] = tensor.dim %[[ARG2]], %[[C1]] : tensor<?x?xf32>			// CHECK-DAG: %[[D4:.*]] = tensor.dim %[[ARG2]], %[[C1]] : tensor<?x?xf32>
	// CHECK: %[[E:.*]] = tensor.empty(%[[D3]], %[[D4]]) : tensor<?x?x5xf32>			// CHECK: %[[E:.*]] = tensor.empty(%[[D3]], %[[D4]]) : tensor<?x?x5xf32>
	// CHECK: %[[F:.*]] = linalg.fill ins(%[[I]] : f32) outs(%[[E]] : tensor<?x?x5xf32>) -> tensor<?x?x5xf32>			// CHECK: %[[F:.*]] = linalg.fill ins(%[[I]] : f32) outs(%[[E]] : tensor<?x?x5xf32>)
	// CHECK: %[[L:.*]] = scf.foreach_thread (%[[IV:.+]]) in (%[[C5]]) shared_outs(%[[ARG3:.+]] = %[[F]]) -> (tensor<?x?x5xf32>) {			// CHECK: %[[L:.*]] = scf.foreach_thread (%[[IV:.+]]) in (%[[C5]]) shared_outs(%[[ARG3:.+]] = %[[F]]) -> (tensor<?x?x5xf32>) {
	// CHECK-DAG: %[[TS0:.+]] = affine.min #[[MAP0]](%[[IV]])[%[[D1]]]			// CHECK-DAG: %[[TS0:.+]] = affine.min #[[MAP0]](%[[IV]])[%[[D1]]]
	// CHECK-DAG: %[[TS1:.+]] = affine.max #[[MAP1]](%[[TS0]])			// CHECK-DAG: %[[TS1:.+]] = affine.max #[[MAP1]](%[[TS0]])
	// CHECK-DAG: %[[ET:.+]] = tensor.extract_slice %[[ARG3:.+]][0, 0, %[[IV]]] [%[[D0]], %[[D2]], 1] [1, 1, 1] : tensor<?x?x5xf32> to tensor<?x?xf32>			// CHECK-DAG: %[[ET:.+]] = tensor.extract_slice %[[ARG3:.+]][0, 0, %[[IV]]] [%[[D0]], %[[D2]], 1] [1, 1, 1] : tensor<?x?x5xf32> to tensor<?x?xf32>
	// CHECK: %[[TINDEX:.+]] = affine.apply #[[MAP2]](%[[IV]])[%[[D1]]]			// CHECK: %[[TINDEX:.+]] = affine.apply #[[MAP2]](%[[IV]])[%[[D1]]]
	// CHECK: %[[INCHUNKA:.+]] = tensor.extract_slice %[[ARG0]][0, %[[TINDEX]]] [%[[D0]], %[[TS1]]] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>			// CHECK: %[[INCHUNKA:.+]] = tensor.extract_slice %[[ARG0]][0, %[[TINDEX]]] [%[[D0]], %[[TS1]]] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
	// CHECK: %[[INCHUNKB:.+]] = tensor.extract_slice %[[ARG1]][%[[TINDEX]], 0] [%[[TS1]], %[[D2]]] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>			// CHECK: %[[INCHUNKB:.+]] = tensor.extract_slice %[[ARG1]][%[[TINDEX]], 0] [%[[TS1]], %[[D2]]] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
	// CHECK: %[[TEMPEXT:.+]] = tensor.extract_slice %[[ET]][0, 0] [%[[D0]], %[[D2]]] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>			// CHECK: %[[TEMPEXT:.+]] = tensor.extract_slice %[[ET]][0, 0] [%[[D0]], %[[D2]]] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
	// CHECK: %[[PARTIAL:.+]] = linalg.matmul ins(%[[INCHUNKA]], %[[INCHUNKB]] : tensor<?x?xf32>, tensor<?x?xf32>) outs(%[[TEMPEXT]] : tensor<?x?xf32>) -> tensor<?x?xf32>			// CHECK: %[[PARTIAL:.+]] = linalg.matmul ins(%[[INCHUNKA]], %[[INCHUNKB]] : tensor<?x?xf32>, tensor<?x?xf32>) outs(%[[TEMPEXT]] : tensor<?x?xf32>)
	// CHECK: scf.foreach_thread.perform_concurrently {			// CHECK: scf.foreach_thread.perform_concurrently {
	// CHECK: tensor.parallel_insert_slice %[[PARTIAL]] into %[[ARG3]][0, 0, %[[IV]]] [%[[D0]], %[[D2]], 1] [1, 1, 1] : tensor<?x?xf32> into tensor<?x?x5xf32>			// CHECK: tensor.parallel_insert_slice %[[PARTIAL]] into %[[ARG3]][0, 0, %[[IV]]] [%[[D0]], %[[D2]], 1] [1, 1, 1] : tensor<?x?xf32> into tensor<?x?x5xf32>
	// CHECK: }			// CHECK: }
	// CHECK: }			// CHECK: }
	// CHECK: %[[R:.*]] = linalg.generic {indexing_maps = [#[[MAP3]], #[[MAP4]]], iterator_types = ["parallel", "parallel", "reduction"]} ins(%[[L]] : tensor<?x?x5xf32>) outs(%[[ARG2]] : tensor<?x?xf32>) {			// CHECK: %[[R:.*]] = linalg.generic {indexing_maps = [#[[MAP3]], #[[MAP4]]], iterator_types = ["parallel", "parallel", "reduction"]} ins(%[[L]] : tensor<?x?x5xf32>) outs(%[[ARG2]] : tensor<?x?xf32>) {
	// CHECK: arith.addf			// CHECK: arith.addf
	// CHECK: linalg.yield			// CHECK: linalg.yield
	// CHECK: } -> tensor<?x?xf32>			// CHECK: } -> tensor<?x?xf32>
	Show All 32 Lines
	// CHECK-DAG: %[[I:.*]] = arith.constant 0.000000e+00 : f32			// CHECK-DAG: %[[I:.*]] = arith.constant 0.000000e+00 : f32
	// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index			// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
	// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index			// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
	// CHECK-DAG: %[[C5:.*]] = arith.constant 5 : index			// CHECK-DAG: %[[C5:.*]] = arith.constant 5 : index
	// CHECK-DAG: %[[C15:.*]] = arith.constant 15 : index			// CHECK-DAG: %[[C15:.*]] = arith.constant 15 : index
	// CHECK-DAG: %[[D0:.*]] = tensor.dim %[[ARG0]], %[[C0]] : tensor<?x?xf32>			// CHECK-DAG: %[[D0:.*]] = tensor.dim %[[ARG0]], %[[C0]] : tensor<?x?xf32>
	// CHECK-DAG: %[[D2:.*]] = tensor.dim %[[ARG1]], %[[C0]] : tensor<?xf32>			// CHECK-DAG: %[[D2:.*]] = tensor.dim %[[ARG1]], %[[C0]] : tensor<?xf32>
	// CHECK: %[[E:.*]] = tensor.empty(%[[D2]]) : tensor<?x5xf32>			// CHECK: %[[E:.*]] = tensor.empty(%[[D2]]) : tensor<?x5xf32>
	// CHECK: %[[F:.*]] = linalg.fill ins(%[[I]] : f32) outs(%[[E]] : tensor<?x5xf32>) -> tensor<?x5xf32>			// CHECK: %[[F:.*]] = linalg.fill ins(%[[I]] : f32) outs(%[[E]] : tensor<?x5xf32>)
	// CHECK: %[[L:.*]] = scf.foreach_thread (%[[IV:.+]]) in (%[[C5]]) shared_outs(%[[ARG3:.+]] = %[[F]]) -> (tensor<?x5xf32>) {			// CHECK: %[[L:.*]] = scf.foreach_thread (%[[IV:.+]]) in (%[[C5]]) shared_outs(%[[ARG3:.+]] = %[[F]]) -> (tensor<?x5xf32>) {
	// CHECK: %[[ET:.+]] = tensor.extract_slice %[[ARG3:.+]][0, %[[IV]]] [%[[D0]], 1] [1, 1] : tensor<?x5xf32> to tensor<?xf32>			// CHECK: %[[ET:.+]] = tensor.extract_slice %[[ARG3:.+]][0, %[[IV]]] [%[[D0]], 1] [1, 1] : tensor<?x5xf32> to tensor<?xf32>
	// CHECK: %[[D1:.*]] = tensor.dim %[[ARG0]], %[[C1]] : tensor<?x?xf32>			// CHECK: %[[D1:.*]] = tensor.dim %[[ARG0]], %[[C1]] : tensor<?x?xf32>
	// CHECK: %[[LB:.+]] = affine.apply #[[MAP0]]()[%[[IV]]]			// CHECK: %[[LB:.+]] = affine.apply #[[MAP0]]()[%[[IV]]]
	// CHECK: %[[CARRY:.+]] = scf.for %[[IV1:.+]] = %[[LB]] to %[[D1]] step %[[C15]] iter_args(%[[ACC:.+]] = %[[ET]]) -> (tensor<?xf32>) {			// CHECK: %[[CARRY:.+]] = scf.for %[[IV1:.+]] = %[[LB]] to %[[D1]] step %[[C15]] iter_args(%[[ACC:.+]] = %[[ET]]) -> (tensor<?xf32>) {
	// CHECK: %[[TS0:.+]] = affine.min #[[MAP1]](%[[IV1]])[%[[D1]]]			// CHECK: %[[TS0:.+]] = affine.min #[[MAP1]](%[[IV1]])[%[[D1]]]
	// CHECK: %[[D3:.+]] = tensor.dim %[[ACC]], %[[C0]] : tensor<?xf32>			// CHECK: %[[D3:.+]] = tensor.dim %[[ACC]], %[[C0]] : tensor<?xf32>
	// CHECK: %[[INCHUNK:.+]] = tensor.extract_slice %[[ARG0]][0, %[[IV1]]] [%[[D0]], %[[TS0]]] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>			// CHECK: %[[INCHUNK:.+]] = tensor.extract_slice %[[ARG0]][0, %[[IV1]]] [%[[D0]], %[[TS0]]] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
	▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/vectorization.mlir

Show First 20 Lines • Show All 693 Lines • ▼ Show 20 Lines	func.func @matmul_tensors(
//		//
// linalg matmul lowers gets expanded to a 3D reduction, canonicalization later		// linalg matmul lowers gets expanded to a 3D reduction, canonicalization later
// convert it to a 2D contract.		// convert it to a 2D contract.
// CHECK: %[[MUL:.*]] = arith.mulf %[[V0]], %[[V1]] : vector<8x12x4xf32>		// CHECK: %[[MUL:.*]] = arith.mulf %[[V0]], %[[V1]] : vector<8x12x4xf32>
// CHECK: %[[R:.*]] = vector.multi_reduction <add>, %[[MUL]], %[[V2]] [2] : vector<8x12x4xf32> to vector<8x12xf32>		// CHECK: %[[R:.*]] = vector.multi_reduction <add>, %[[MUL]], %[[V2]] [2] : vector<8x12x4xf32> to vector<8x12xf32>
// CHECK: %[[W:.*]] = vector.transfer_write %[[R]], %[[ARG2]][%[[C0]], %[[C0]]] {in_bounds = [true, true]} : vector<8x12xf32>, tensor<8x12xf32>		// CHECK: %[[W:.*]] = vector.transfer_write %[[R]], %[[ARG2]][%[[C0]], %[[C0]]] {in_bounds = [true, true]} : vector<8x12xf32>, tensor<8x12xf32>
%0 = linalg.matmul ins(%arg0, %arg1: tensor<8x4xf32>, tensor<4x12xf32>)		%0 = linalg.matmul ins(%arg0, %arg1: tensor<8x4xf32>, tensor<4x12xf32>)
outs(%arg2: tensor<8x12xf32>)		outs(%arg2: tensor<8x12xf32>)
-> tensor<8x12xf32>
// CHECK: return %[[W]] : tensor<8x12xf32>		// CHECK: return %[[W]] : tensor<8x12xf32>
return %0 : tensor<8x12xf32>		return %0 : tensor<8x12xf32>
}		}

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb1(%arg1: !pdl.operation):		^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1		%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1
%1 = get_closest_isolated_parent %0 : (!pdl.operation) -> !pdl.operation		%1 = get_closest_isolated_parent %0 : (!pdl.operation) -> !pdl.operation
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
// CHECK-DAG: %[[C5:.*]] = arith.constant 5 : index		// CHECK-DAG: %[[C5:.*]] = arith.constant 5 : index
// CHECK: %[[V0:.*]] = arith.addi %[[LOW]], %[[C2]] : index		// CHECK: %[[V0:.*]] = arith.addi %[[LOW]], %[[C2]] : index
// CHECK: %[[V1:.*]] = arith.addi %[[V0]], %[[C3]] : index		// CHECK: %[[V1:.*]] = arith.addi %[[V0]], %[[C3]] : index
// CHECK: %[[V2:.*]] = arith.addi %[[HIGH]], %[[C5]] : index		// CHECK: %[[V2:.*]] = arith.addi %[[HIGH]], %[[C5]] : index
// CHECK: %[[DIM3:.*]] = tensor.dim %[[SRC]], %[[C3]] : tensor<1x2x2x?xf32>		// CHECK: %[[DIM3:.*]] = tensor.dim %[[SRC]], %[[C3]] : tensor<1x2x2x?xf32>
// CHECK: %[[V4:.*]] = arith.addi %[[DIM3]], %[[C3]] : index		// CHECK: %[[V4:.*]] = arith.addi %[[DIM3]], %[[C3]] : index
// CHECK: %[[V5:.*]] = arith.addi %[[V4]], %[[C2]] : index		// CHECK: %[[V5:.*]] = arith.addi %[[V4]], %[[C2]] : index
// CHECK: %[[INIT:.*]] = tensor.empty(%[[V1]], %[[V2]], %[[V5]]) : tensor<6x?x?x?xf32>		// CHECK: %[[INIT:.*]] = tensor.empty(%[[V1]], %[[V2]], %[[V5]]) : tensor<6x?x?x?xf32>
// CHECK: %[[FILL:.]] = linalg.fill ins(%{{.}} : f32) outs(%[[INIT]] : tensor<6x?x?x?xf32>) -> tensor<6x?x?x?xf32>		// CHECK: %[[FILL:.]] = linalg.fill ins(%{{.}} : f32) outs(%[[INIT]] : tensor<6x?x?x?xf32>)
// CHECK: %[[SRCDIM:.*]] = tensor.dim %[[SRC]], %[[C3]] : tensor<1x2x2x?xf32>		// CHECK: %[[SRCDIM:.*]] = tensor.dim %[[SRC]], %[[C3]] : tensor<1x2x2x?xf32>
// CHECK: %[[RESULT:.*]] = tensor.insert_slice %[[SRC]] into %[[FILL]][2, %[[LOW]], 3, 3] [1, 2, 2, %[[SRCDIM]]] [1, 1, 1, 1] : tensor<1x2x2x?xf32> into tensor<6x?x?x?xf32>		// CHECK: %[[RESULT:.*]] = tensor.insert_slice %[[SRC]] into %[[FILL]][2, %[[LOW]], 3, 3] [1, 2, 2, %[[SRCDIM]]] [1, 1, 1, 1] : tensor<1x2x2x?xf32> into tensor<6x?x?x?xf32>
// CHECK: return %[[RESULT]]		// CHECK: return %[[RESULT]]
func.func @pad_static_dynamic(%arg0: tensor<1x2x2x?xf32>, %low: index, %high: index,		func.func @pad_static_dynamic(%arg0: tensor<1x2x2x?xf32>, %low: index, %high: index,
%pad_value: f32) -> tensor<6x?x?x?xf32> {		%pad_value: f32) -> tensor<6x?x?x?xf32> {
%0 = tensor.pad %arg0 low[2, %low, 3, 3] high[3, 3, %high, 2] {		%0 = tensor.pad %arg0 low[2, %low, 3, 3] high[3, 3, %high, 2] {
^bb0(%arg1: index, %arg2: index, %arg3: index, %arg4: index):		^bb0(%arg1: index, %arg2: index, %arg3: index, %arg4: index):
tensor.yield %pad_value : f32		tensor.yield %pad_value : f32
▲ Show 20 Lines • Show All 294 Lines • ▼ Show 20 Lines
// CHECK-LABEL: func @red_max_2d(		// CHECK-LABEL: func @red_max_2d(
func.func @red_max_2d(%arg0: tensor<4x4xf32>) -> tensor<4xf32> {		func.func @red_max_2d(%arg0: tensor<4x4xf32>) -> tensor<4xf32> {
// CHECK: %[[CMINF:.+]] = arith.constant dense<-3.402820e+38> : vector<4xf32>		// CHECK: %[[CMINF:.+]] = arith.constant dense<-3.402820e+38> : vector<4xf32>
// CHECK: tensor.empty() : tensor<4xf32>		// CHECK: tensor.empty() : tensor<4xf32>
// CHECK: vector.multi_reduction <maxf>, {{.*}}, %[[CMINF]] [1] : vector<4x4xf32> to vector<4xf32>		// CHECK: vector.multi_reduction <maxf>, {{.*}}, %[[CMINF]] [1] : vector<4x4xf32> to vector<4xf32>
// CHECK: vector.transfer_write {{.*}} : vector<4xf32>, tensor<4xf32>		// CHECK: vector.transfer_write {{.*}} : vector<4xf32>, tensor<4xf32>
%ident = arith.constant -3.40282e+38 : f32		%ident = arith.constant -3.40282e+38 : f32
%init = tensor.empty() : tensor<4xf32>		%init = tensor.empty() : tensor<4xf32>
%fill = linalg.fill ins(%ident : f32) outs(%init : tensor<4xf32>) -> tensor<4xf32>		%fill = linalg.fill ins(%ident : f32) outs(%init : tensor<4xf32>)
%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,		%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
affine_map<(d0, d1) -> (d0)>],		affine_map<(d0, d1) -> (d0)>],
iterator_types = ["parallel", "reduction"]}		iterator_types = ["parallel", "reduction"]}
ins(%arg0 : tensor<4x4xf32>) outs(%fill : tensor<4xf32>) {		ins(%arg0 : tensor<4x4xf32>) outs(%fill : tensor<4xf32>) {
^bb0(%in0: f32, %out0: f32):		^bb0(%in0: f32, %out0: f32):
%max = arith.maxf %in0, %out0 : f32		%max = arith.maxf %in0, %out0 : f32
linalg.yield %max : f32		linalg.yield %max : f32
} -> tensor<4xf32>		} -> tensor<4xf32>
Show All 14 Lines
func.func @red_min_2d(%arg0: tensor<4x4xf32>) -> tensor<4xf32> {		func.func @red_min_2d(%arg0: tensor<4x4xf32>) -> tensor<4xf32> {
// CHECK: %[[CMAXF:.+]] = arith.constant dense<3.402820e+38> : vector<4xf32>		// CHECK: %[[CMAXF:.+]] = arith.constant dense<3.402820e+38> : vector<4xf32>
// CHECK: tensor.empty() : tensor<4xf32>		// CHECK: tensor.empty() : tensor<4xf32>
// CHECK: vector.transfer_read {{.*}} : tensor<4x4xf32>, vector<4x4xf32>		// CHECK: vector.transfer_read {{.*}} : tensor<4x4xf32>, vector<4x4xf32>
// CHECK: vector.multi_reduction <minf>, {{.*}}, %[[CMAXF]] [1] : vector<4x4xf32> to vector<4xf32>		// CHECK: vector.multi_reduction <minf>, {{.*}}, %[[CMAXF]] [1] : vector<4x4xf32> to vector<4xf32>
// CHECK: vector.transfer_write {{.*}} : vector<4xf32>, tensor<4xf32>		// CHECK: vector.transfer_write {{.*}} : vector<4xf32>, tensor<4xf32>
%maxf32 = arith.constant 3.40282e+38 : f32		%maxf32 = arith.constant 3.40282e+38 : f32
%init = tensor.empty() : tensor<4xf32>		%init = tensor.empty() : tensor<4xf32>
%fill = linalg.fill ins(%maxf32 : f32) outs(%init : tensor<4xf32>) -> tensor<4xf32>		%fill = linalg.fill ins(%maxf32 : f32) outs(%init : tensor<4xf32>)
%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,		%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
affine_map<(d0, d1) -> (d0)>],		affine_map<(d0, d1) -> (d0)>],
iterator_types = ["parallel", "reduction"]}		iterator_types = ["parallel", "reduction"]}
ins(%arg0 : tensor<4x4xf32>) outs(%fill : tensor<4xf32>) {		ins(%arg0 : tensor<4x4xf32>) outs(%fill : tensor<4xf32>) {
^bb0(%in0: f32, %out0: f32):		^bb0(%in0: f32, %out0: f32):
%min = arith.minf %out0, %in0 : f32		%min = arith.minf %out0, %in0 : f32
linalg.yield %min : f32		linalg.yield %min : f32
} -> tensor<4xf32>		} -> tensor<4xf32>
Show All 13 Lines
// CHECK-LABEL: func @red_mul_2d(		// CHECK-LABEL: func @red_mul_2d(
func.func @red_mul_2d(%arg0: tensor<4x4xf32>) -> tensor<4xf32> {		func.func @red_mul_2d(%arg0: tensor<4x4xf32>) -> tensor<4xf32> {
// CHECK: tensor.empty() : tensor<4xf32>		// CHECK: tensor.empty() : tensor<4xf32>
// CHECK: vector.transfer_read {{.*}} : tensor<4x4xf32>, vector<4x4xf32>		// CHECK: vector.transfer_read {{.*}} : tensor<4x4xf32>, vector<4x4xf32>
// CHECK: vector.multi_reduction <mul>, {{.}}, {{.}} [1] : vector<4x4xf32> to vector<4xf32>		// CHECK: vector.multi_reduction <mul>, {{.}}, {{.}} [1] : vector<4x4xf32> to vector<4xf32>
// CHECK: vector.transfer_write {{.*}} : vector<4xf32>, tensor<4xf32>		// CHECK: vector.transfer_write {{.*}} : vector<4xf32>, tensor<4xf32>
%ident = arith.constant 1.0 : f32		%ident = arith.constant 1.0 : f32
%init = tensor.empty() : tensor<4xf32>		%init = tensor.empty() : tensor<4xf32>
%fill = linalg.fill ins(%ident : f32) outs(%init : tensor<4xf32>) -> tensor<4xf32>		%fill = linalg.fill ins(%ident : f32) outs(%init : tensor<4xf32>)
%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,		%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
affine_map<(d0, d1) -> (d0)>],		affine_map<(d0, d1) -> (d0)>],
iterator_types = ["parallel", "reduction"]}		iterator_types = ["parallel", "reduction"]}
ins(%arg0 : tensor<4x4xf32>) outs(%fill : tensor<4xf32>) {		ins(%arg0 : tensor<4x4xf32>) outs(%fill : tensor<4xf32>) {
^bb0(%in0: f32, %out0: f32):		^bb0(%in0: f32, %out0: f32):
%mul = arith.mulf %in0, %out0 : f32		%mul = arith.mulf %in0, %out0 : f32
linalg.yield %mul : f32		linalg.yield %mul : f32
} -> tensor<4xf32>		} -> tensor<4xf32>
Show All 13 Lines
// CHECK-LABEL: func @red_or_2d(		// CHECK-LABEL: func @red_or_2d(
func.func @red_or_2d(%arg0: tensor<4x4xi1>) -> tensor<4xi1> {		func.func @red_or_2d(%arg0: tensor<4x4xi1>) -> tensor<4xi1> {
// CHECK: tensor.empty() : tensor<4xi1>		// CHECK: tensor.empty() : tensor<4xi1>
// CHECK: vector.transfer_read {{.*}} : tensor<4x4xi1>, vector<4x4xi1>		// CHECK: vector.transfer_read {{.*}} : tensor<4x4xi1>, vector<4x4xi1>
// CHECK: vector.multi_reduction <or>, {{.}}, {{.}} [1] : vector<4x4xi1> to vector<4xi1>		// CHECK: vector.multi_reduction <or>, {{.}}, {{.}} [1] : vector<4x4xi1> to vector<4xi1>
// CHECK: vector.transfer_write {{.*}} : vector<4xi1>, tensor<4xi1>		// CHECK: vector.transfer_write {{.*}} : vector<4xi1>, tensor<4xi1>
%ident = arith.constant false		%ident = arith.constant false
%init = tensor.empty() : tensor<4xi1>		%init = tensor.empty() : tensor<4xi1>
%fill = linalg.fill ins(%ident : i1) outs(%init : tensor<4xi1>) -> tensor<4xi1>		%fill = linalg.fill ins(%ident : i1) outs(%init : tensor<4xi1>)
%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,		%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
affine_map<(d0, d1) -> (d0)>],		affine_map<(d0, d1) -> (d0)>],
iterator_types = ["parallel", "reduction"]}		iterator_types = ["parallel", "reduction"]}
ins(%arg0 : tensor<4x4xi1>) outs(%fill : tensor<4xi1>) {		ins(%arg0 : tensor<4x4xi1>) outs(%fill : tensor<4xi1>) {
^bb0(%in0: i1, %out0: i1):		^bb0(%in0: i1, %out0: i1):
%or = arith.ori %in0, %out0 : i1		%or = arith.ori %in0, %out0 : i1
linalg.yield %or : i1		linalg.yield %or : i1
} -> tensor<4xi1>		} -> tensor<4xi1>
Show All 13 Lines
// CHECK-LABEL: func @red_and_2d(		// CHECK-LABEL: func @red_and_2d(
func.func @red_and_2d(%arg0: tensor<4x4xi1>) -> tensor<4xi1> {		func.func @red_and_2d(%arg0: tensor<4x4xi1>) -> tensor<4xi1> {
// CHECK: tensor.empty() : tensor<4xi1>		// CHECK: tensor.empty() : tensor<4xi1>
// CHECK: vector.transfer_read {{.*}} : tensor<4x4xi1>, vector<4x4xi1>		// CHECK: vector.transfer_read {{.*}} : tensor<4x4xi1>, vector<4x4xi1>
// CHECK: vector.multi_reduction <and>, {{.}}, {{.}} [1] : vector<4x4xi1> to vector<4xi1>		// CHECK: vector.multi_reduction <and>, {{.}}, {{.}} [1] : vector<4x4xi1> to vector<4xi1>
// CHECK: vector.transfer_write {{.*}} : vector<4xi1>, tensor<4xi1>		// CHECK: vector.transfer_write {{.*}} : vector<4xi1>, tensor<4xi1>
%ident = arith.constant true		%ident = arith.constant true
%init = tensor.empty() : tensor<4xi1>		%init = tensor.empty() : tensor<4xi1>
%fill = linalg.fill ins(%ident : i1) outs(%init : tensor<4xi1>) -> tensor<4xi1>		%fill = linalg.fill ins(%ident : i1) outs(%init : tensor<4xi1>)
%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,		%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
affine_map<(d0, d1) -> (d0)>],		affine_map<(d0, d1) -> (d0)>],
iterator_types = ["parallel", "reduction"]}		iterator_types = ["parallel", "reduction"]}
ins(%arg0 : tensor<4x4xi1>) outs(%fill : tensor<4xi1>) {		ins(%arg0 : tensor<4x4xi1>) outs(%fill : tensor<4xi1>) {
^bb0(%in0: i1, %out0: i1):		^bb0(%in0: i1, %out0: i1):
%and = arith.andi %in0, %out0 : i1		%and = arith.andi %in0, %out0 : i1
linalg.yield %and : i1		linalg.yield %and : i1
} -> tensor<4xi1>		} -> tensor<4xi1>
Show All 13 Lines
// CHECK-LABEL: func @red_xor_2d(		// CHECK-LABEL: func @red_xor_2d(
func.func @red_xor_2d(%arg0: tensor<4x4xi1>) -> tensor<4xi1> {		func.func @red_xor_2d(%arg0: tensor<4x4xi1>) -> tensor<4xi1> {
// CHECK: tensor.empty() : tensor<4xi1>		// CHECK: tensor.empty() : tensor<4xi1>
// CHECK: vector.transfer_read {{.*}} : tensor<4x4xi1>, vector<4x4xi1>		// CHECK: vector.transfer_read {{.*}} : tensor<4x4xi1>, vector<4x4xi1>
// CHECK: vector.multi_reduction <xor>, {{.}}, {{.}} [1] : vector<4x4xi1> to vector<4xi1>		// CHECK: vector.multi_reduction <xor>, {{.}}, {{.}} [1] : vector<4x4xi1> to vector<4xi1>
// CHECK: vector.transfer_write {{.*}} : vector<4xi1>, tensor<4xi1>		// CHECK: vector.transfer_write {{.*}} : vector<4xi1>, tensor<4xi1>
%ident = arith.constant false		%ident = arith.constant false
%init = tensor.empty() : tensor<4xi1>		%init = tensor.empty() : tensor<4xi1>
%fill = linalg.fill ins(%ident : i1) outs(%init : tensor<4xi1>) -> tensor<4xi1>		%fill = linalg.fill ins(%ident : i1) outs(%init : tensor<4xi1>)
%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,		%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
affine_map<(d0, d1) -> (d0)>],		affine_map<(d0, d1) -> (d0)>],
iterator_types = ["parallel", "reduction"]}		iterator_types = ["parallel", "reduction"]}
ins(%arg0 : tensor<4x4xi1>) outs(%fill : tensor<4xi1>) {		ins(%arg0 : tensor<4x4xi1>) outs(%fill : tensor<4xi1>) {
^bb0(%in0: i1, %out0: i1):		^bb0(%in0: i1, %out0: i1):
%xor = arith.xori %in0, %out0 : i1		%xor = arith.xori %in0, %out0 : i1
linalg.yield %xor : i1		linalg.yield %xor : i1
} -> tensor<4xi1>		} -> tensor<4xi1>
Show All 15 Lines
// CHECK-LABEL: func @explicit_broadcast(		// CHECK-LABEL: func @explicit_broadcast(
func.func @explicit_broadcast(%arg0: tensor<4x4xf32>, %arg1: tensor<4x1xf32>) -> tensor<4x4xf32> {		func.func @explicit_broadcast(%arg0: tensor<4x4xf32>, %arg1: tensor<4x1xf32>) -> tensor<4x4xf32> {
// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true]} : tensor<4x4xf32>, vector<4x4xf32>		// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true]} : tensor<4x4xf32>, vector<4x4xf32>
// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true], permutation_map = #[[$M5]]} : tensor<4x1xf32>, vector<4x4xf32>		// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true], permutation_map = #[[$M5]]} : tensor<4x1xf32>, vector<4x4xf32>
// CHECK: subf {{.*}} : vector<4x4xf32>		// CHECK: subf {{.*}} : vector<4x4xf32>
// CHECK: vector.transfer_write {{.*}} {in_bounds = [true, true]} : vector<4x4xf32>, tensor<4x4xf32>		// CHECK: vector.transfer_write {{.*}} {in_bounds = [true, true]} : vector<4x4xf32>, tensor<4x4xf32>
%c0 = arith.constant 0.0 : f32		%c0 = arith.constant 0.0 : f32
%init = tensor.empty() : tensor<4x4xf32>		%init = tensor.empty() : tensor<4x4xf32>
%fill = linalg.fill ins(%c0 : f32) outs(%init : tensor<4x4xf32>) -> tensor<4x4xf32>		%fill = linalg.fill ins(%c0 : f32) outs(%init : tensor<4x4xf32>)
%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,		%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
affine_map<(d0, d1) -> (d0, 0)>,		affine_map<(d0, d1) -> (d0, 0)>,
affine_map<(d0, d1) -> (d0, d1)>],		affine_map<(d0, d1) -> (d0, d1)>],
iterator_types = ["parallel", "parallel"]}		iterator_types = ["parallel", "parallel"]}
ins(%arg0, %arg1 : tensor<4x4xf32>, tensor<4x1xf32>)		ins(%arg0, %arg1 : tensor<4x4xf32>, tensor<4x1xf32>)
outs(%fill : tensor<4x4xf32>) {		outs(%fill : tensor<4x4xf32>) {
^bb0(%arg7: f32, %arg8: f32, %arg9: f32):		^bb0(%arg7: f32, %arg8: f32, %arg9: f32):
%40 = arith.subf %arg7, %arg8 : f32		%40 = arith.subf %arg7, %arg8 : f32
Show All 19 Lines	func.func @fused_broadcast_red_2d(%arg0: tensor<4x4xf32>, %arg1: tensor<4x1xf32>) -> tensor<4xf32> {
// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true]} : tensor<4x4xf32>, vector<4x4xf32>		// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true]} : tensor<4x4xf32>, vector<4x4xf32>
// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true], permutation_map = #[[$M6]]} : tensor<4x1xf32>, vector<4x4xf32>		// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true], permutation_map = #[[$M6]]} : tensor<4x1xf32>, vector<4x4xf32>
// CHECK: subf {{.*}} : vector<4x4xf32>		// CHECK: subf {{.*}} : vector<4x4xf32>
// CHECK: math.exp {{.*}} : vector<4x4xf32>		// CHECK: math.exp {{.*}} : vector<4x4xf32>
// CHECK: vector.multi_reduction <add>, {{.}}, {{.}} : vector<4x4xf32> to vector<4xf32>		// CHECK: vector.multi_reduction <add>, {{.}}, {{.}} : vector<4x4xf32> to vector<4xf32>
// CHECK: vector.transfer_write {{.*}} {in_bounds = [true]} : vector<4xf32>, tensor<4xf32>		// CHECK: vector.transfer_write {{.*}} {in_bounds = [true]} : vector<4xf32>, tensor<4xf32>
%c0 = arith.constant 0.0 : f32		%c0 = arith.constant 0.0 : f32
%init = tensor.empty() : tensor<4xf32>		%init = tensor.empty() : tensor<4xf32>
%fill = linalg.fill ins(%c0 : f32) outs(%init : tensor<4xf32>) -> tensor<4xf32>		%fill = linalg.fill ins(%c0 : f32) outs(%init : tensor<4xf32>)
%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,		%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
affine_map<(d0, d1) -> (d0, 0)>,		affine_map<(d0, d1) -> (d0, 0)>,
affine_map<(d0, d1) -> (d0)>],		affine_map<(d0, d1) -> (d0)>],
iterator_types = ["parallel", "reduction"]}		iterator_types = ["parallel", "reduction"]}
ins(%arg0, %arg1 : tensor<4x4xf32>, tensor<4x1xf32>)		ins(%arg0, %arg1 : tensor<4x4xf32>, tensor<4x1xf32>)
outs(%fill : tensor<4xf32>) {		outs(%fill : tensor<4xf32>) {
^bb0(%arg7: f32, %arg8: f32, %arg9: f32):		^bb0(%arg7: f32, %arg8: f32, %arg9: f32):
%40 = arith.subf %arg7, %arg8 : f32		%40 = arith.subf %arg7, %arg8 : f32
Show All 24 Lines	func.func @reduce_1d(%arg0: tensor<32xf32>) -> tensor<f32> {
// CHECK-DAG: %[[vF0:.*]] = arith.constant dense<0.000000e+00> : vector<f32>		// CHECK-DAG: %[[vF0:.*]] = arith.constant dense<0.000000e+00> : vector<f32>
// CHECK-DAG: %[[F0:.*]] = arith.constant 0.000000e+00 : f32		// CHECK-DAG: %[[F0:.*]] = arith.constant 0.000000e+00 : f32
// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index		// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
%f0 = arith.constant 0.000000e+00 : f32		%f0 = arith.constant 0.000000e+00 : f32

// CHECK: %[[init:.*]] = tensor.empty() : tensor<f32>		// CHECK: %[[init:.*]] = tensor.empty() : tensor<f32>
%0 = tensor.empty() : tensor<f32>		%0 = tensor.empty() : tensor<f32>

%1 = linalg.fill ins(%f0 : f32) outs(%0 : tensor<f32>) -> tensor<f32>		%1 = linalg.fill ins(%f0 : f32) outs(%0 : tensor<f32>)
// CHECK: %[[r:.*]] = vector.transfer_read %[[A]][%[[C0]]]		// CHECK: %[[r:.*]] = vector.transfer_read %[[A]][%[[C0]]]
// CHECK-SAME: : tensor<32xf32>, vector<32xf32>		// CHECK-SAME: : tensor<32xf32>, vector<32xf32>
// CHECK: %[[f0:.*]] = vector.extractelement %[[vF0]][] : vector<f32>		// CHECK: %[[f0:.*]] = vector.extractelement %[[vF0]][] : vector<f32>
// CHECK: %[[red:.*]] = vector.multi_reduction <add>, %[[r]], %[[f0]] [0]		// CHECK: %[[red:.*]] = vector.multi_reduction <add>, %[[r]], %[[f0]] [0]
// CHECK-SAME: : vector<32xf32> to f32		// CHECK-SAME: : vector<32xf32> to f32
// CHECK: %[[red_v1:.*]] = vector.broadcast %[[red]] : f32 to vector<f32>		// CHECK: %[[red_v1:.*]] = vector.broadcast %[[red]] : f32 to vector<f32>
// CHECK: %[[res:.*]] = vector.transfer_write %[[red_v1]], %[[init]][]		// CHECK: %[[res:.*]] = vector.transfer_write %[[red_v1]], %[[init]][]
// CHECK-SAME: : vector<f32>, tensor<f32>		// CHECK-SAME: : vector<f32>, tensor<f32>
Show All 24 Lines
// This test checks that vectorization does not occur when an input indexing map		// This test checks that vectorization does not occur when an input indexing map
// is not a projected permutation. In the future, this can be converted to a		// is not a projected permutation. In the future, this can be converted to a
// positive test when support is added.		// positive test when support is added.

// CHECK-LABEL: func @not_projected_permutation		// CHECK-LABEL: func @not_projected_permutation
func.func @not_projected_permutation(%arg0: tensor<8x8xf32>) -> tensor<6x6x3x3xf32> {		func.func @not_projected_permutation(%arg0: tensor<8x8xf32>) -> tensor<6x6x3x3xf32> {
%c0 = arith.constant 0.0 : f32		%c0 = arith.constant 0.0 : f32
%init = tensor.empty() : tensor<6x6x3x3xf32>		%init = tensor.empty() : tensor<6x6x3x3xf32>
%fill = linalg.fill ins(%c0 : f32) outs(%init : tensor<6x6x3x3xf32>) -> tensor<6x6x3x3xf32>		%fill = linalg.fill ins(%c0 : f32) outs(%init : tensor<6x6x3x3xf32>)
// CHECK: linalg.generic		// CHECK: linalg.generic
%result = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0 + d2, d1 + d3)>,		%result = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0 + d2, d1 + d3)>,
affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>],		affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>],
iterator_types = ["parallel", "parallel", "parallel", "parallel"]}		iterator_types = ["parallel", "parallel", "parallel", "parallel"]}
ins(%arg0 : tensor<8x8xf32>)		ins(%arg0 : tensor<8x8xf32>)
outs(%fill : tensor<6x6x3x3xf32>) {		outs(%fill : tensor<6x6x3x3xf32>) {
^bb0(%arg7: f32, %arg9: f32):		^bb0(%arg7: f32, %arg9: f32):
linalg.yield %arg7 : f32		linalg.yield %arg7 : f32
▲ Show 20 Lines • Show All 452 Lines • ▼ Show 20 Lines

func.func @wrong_reduction_detection(%input: tensor<120x64xf32>) -> tensor<120x64xf32> {		func.func @wrong_reduction_detection(%input: tensor<120x64xf32>) -> tensor<120x64xf32> {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%c4 = arith.constant 4 : index		%c4 = arith.constant 4 : index
%c64 = arith.constant 64 : index		%c64 = arith.constant 64 : index
%cst_6 = arith.constant 4.000000e+00 : f32		%cst_6 = arith.constant 4.000000e+00 : f32
%1 = scf.for %arg0 = %c0 to %c64 step %c4 iter_args(%arg1 = %input) -> (tensor<120x64xf32>) {		%1 = scf.for %arg0 = %c0 to %c64 step %c4 iter_args(%arg1 = %input) -> (tensor<120x64xf32>) {
%extracted_slice = tensor.extract_slice %arg1[%c0, %arg0] [1, 4] [1, 1] : tensor<120x64xf32> to tensor<1x4xf32>		%extracted_slice = tensor.extract_slice %arg1[%c0, %arg0] [1, 4] [1, 1] : tensor<120x64xf32> to tensor<1x4xf32>
%10 = linalg.fill {__internal_linalg_transform__ = "1"} ins(%cst_6 : f32) outs(%extracted_slice : tensor<1x4xf32>) -> tensor<1x4xf32>		%10 = linalg.fill {__internal_linalg_transform__ = "1"} ins(%cst_6 : f32) outs(%extracted_slice : tensor<1x4xf32>)
%11 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} outs(%10 : tensor<1x4xf32>) {		%11 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} outs(%10 : tensor<1x4xf32>) {
^bb0(%out: f32):		^bb0(%out: f32):
%12 = linalg.index 0 : index		%12 = linalg.index 0 : index
%13 = arith.addi %arg0, %12 : index		%13 = arith.addi %arg0, %12 : index
%18 = arith.index_cast %13 : index to i32		%18 = arith.index_cast %13 : index to i32
%20 = arith.uitofp %18 : i32 to f32		%20 = arith.uitofp %18 : i32 to f32
%67 = arith.mulf %out, %20 : f32		%67 = arith.mulf %out, %20 : f32
linalg.yield %67 : f32		linalg.yield %67 : f32
Show All 18 Lines

mlir/test/Dialect/SCF/one-shot-bufferize-analysis.mlir

Show First 20 Lines • Show All 614 Lines • ▼ Show 20 Lines	func.func @same_enclosing_repetitive_region(%2: tensor<320xf32>,
%cst = arith.constant -0.000000e+00 : f32		%cst = arith.constant -0.000000e+00 : f32
%c320 = arith.constant 320 : index		%c320 = arith.constant 320 : index
%4 = scf.foreach_thread (%arg0) in (%c320) shared_outs(%arg1 = %2) -> (tensor<320xf32>) {		%4 = scf.foreach_thread (%arg0) in (%c320) shared_outs(%arg1 = %2) -> (tensor<320xf32>) {
// CHECK: tensor.extract_slice {{.*}} {__inplace_operands_attr__ = ["true", "none"]}		// CHECK: tensor.extract_slice {{.*}} {__inplace_operands_attr__ = ["true", "none"]}
%5 = tensor.extract_slice %3[%arg0, 0] [1, 10240] [1, 1] : tensor<320x10240xf32> to tensor<1x10240xf32>		%5 = tensor.extract_slice %3[%arg0, 0] [1, 10240] [1, 1] : tensor<320x10240xf32> to tensor<1x10240xf32>
// CHECK: tensor.extract_slice {{.*}} {__inplace_operands_attr__ = ["true", "none"]}		// CHECK: tensor.extract_slice {{.*}} {__inplace_operands_attr__ = ["true", "none"]}
%6 = tensor.extract_slice %arg1[%arg0] [1] [1] : tensor<320xf32> to tensor<1xf32>		%6 = tensor.extract_slice %arg1[%arg0] [1] [1] : tensor<320xf32> to tensor<1xf32>
// CHECK: linalg.fill {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: linalg.fill {__inplace_operands_attr__ = ["none", "true"]}
%7 = linalg.fill ins(%cst : f32) outs(%6 : tensor<1xf32>) -> tensor<1xf32>		%7 = linalg.fill ins(%cst : f32) outs(%6 : tensor<1xf32>)
// CHECK: linalg.fill {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: linalg.fill {__inplace_operands_attr__ = ["none", "true"]}
%8 = linalg.fill ins(%cst : f32) outs(%7 : tensor<1xf32>) -> tensor<1xf32>		%8 = linalg.fill ins(%cst : f32) outs(%7 : tensor<1xf32>)

scf.foreach_thread.perform_concurrently {		scf.foreach_thread.perform_concurrently {
// CHECK: tensor.parallel_insert_slice {{.*}} {__inplace_operands_attr__ = ["true", "true", "none"]}		// CHECK: tensor.parallel_insert_slice {{.*}} {__inplace_operands_attr__ = ["true", "true", "none"]}
tensor.parallel_insert_slice %8 into %arg1[%arg0] [1] [1] : tensor<1xf32> into tensor<320xf32>		tensor.parallel_insert_slice %8 into %arg1[%arg0] [1] [1] : tensor<1xf32> into tensor<320xf32>
}		}
}		}
return %4 : tensor<320xf32>		return %4 : tensor<320xf32>
}		}

// -----		// -----

// CHECK-LABEL: different_repetitive_region_via_alias		// CHECK-LABEL: different_repetitive_region_via_alias
func.func @different_repetitive_region_via_alias(%arg0: tensor<4xf32>,		func.func @different_repetitive_region_via_alias(%arg0: tensor<4xf32>,
%arg1: tensor<4xf32>,		%arg1: tensor<4xf32>,
%arg2: index,		%arg2: index,
%arg3: index,		%arg3: index,
%arg4: index)		%arg4: index)
-> (tensor<4xf32>)		-> (tensor<4xf32>)
{		{
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst2 = arith.constant 1.000000e+00 : f32		%cst2 = arith.constant 1.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<4xf32>		%0 = bufferization.alloc_tensor() : tensor<4xf32>

// CHECK: linalg.fill {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: linalg.fill {__inplace_operands_attr__ = ["none", "false"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<4xf32>) -> tensor<4xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<4xf32>)

%2 = scf.for %arg5 = %arg2 to %arg3 step %arg4 iter_args(%arg6 = %arg1) -> (tensor<4xf32>) {		%2 = scf.for %arg5 = %arg2 to %arg3 step %arg4 iter_args(%arg6 = %arg1) -> (tensor<4xf32>) {
// CHECK: tensor.extract {{.*}} {__inplace_operands_attr__ = ["true", "none"]}		// CHECK: tensor.extract {{.*}} {__inplace_operands_attr__ = ["true", "none"]}
%4 = tensor.extract %1[%arg4] : tensor<4xf32>		%4 = tensor.extract %1[%arg4] : tensor<4xf32>
vector.print %4 : f32		vector.print %4 : f32
// CHECK: linalg.fill {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: linalg.fill {__inplace_operands_attr__ = ["none", "true"]}
%5 = linalg.fill ins(%cst2 : f32) outs(%0 : tensor<4xf32>) -> tensor<4xf32>		%5 = linalg.fill ins(%cst2 : f32) outs(%0 : tensor<4xf32>)
scf.yield %5 : tensor<4xf32>		scf.yield %5 : tensor<4xf32>
}		}

return %2 : tensor<4xf32>		return %2 : tensor<4xf32>
}		}

// -----		// -----

// CHECK-LABEL: no_raw_conflict_after_repetitive_use		// CHECK-LABEL: no_raw_conflict_after_repetitive_use
func.func @no_raw_conflict_after_repetitive_use(%arg0: tensor<4xf32>,		func.func @no_raw_conflict_after_repetitive_use(%arg0: tensor<4xf32>,
%arg1: tensor<4xf32>,		%arg1: tensor<4xf32>,
%arg2: index,		%arg2: index,
%arg3: index,		%arg3: index,
%arg4: index)		%arg4: index)
-> (tensor<4xf32>, tensor<4xf32>)		-> (tensor<4xf32>, tensor<4xf32>)
{		{
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst2 = arith.constant 1.000000e+00 : f32		%cst2 = arith.constant 1.000000e+00 : f32
%cst3 = arith.constant 2.000000e+00 : f32		%cst3 = arith.constant 2.000000e+00 : f32
%0 = bufferization.alloc_tensor() : tensor<4xf32>		%0 = bufferization.alloc_tensor() : tensor<4xf32>

// CHECK: linalg.fill {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: linalg.fill {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<4xf32>) -> tensor<4xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<4xf32>)

%2 = scf.for %arg5 = %arg2 to %arg3 step %arg4 iter_args(%arg6 = %arg1) -> (tensor<4xf32>) {		%2 = scf.for %arg5 = %arg2 to %arg3 step %arg4 iter_args(%arg6 = %arg1) -> (tensor<4xf32>) {
// CHECK: tensor.extract {{.*}} {__inplace_operands_attr__ = ["true", "none"]}		// CHECK: tensor.extract {{.*}} {__inplace_operands_attr__ = ["true", "none"]}
%4 = tensor.extract %1[%arg4] : tensor<4xf32>		%4 = tensor.extract %1[%arg4] : tensor<4xf32>
vector.print %4 : f32		vector.print %4 : f32
// CHECK: linalg.fill {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: linalg.fill {__inplace_operands_attr__ = ["none", "false"]}
%5 = linalg.fill ins(%cst2 : f32) outs(%1 : tensor<4xf32>) -> tensor<4xf32>		%5 = linalg.fill ins(%cst2 : f32) outs(%1 : tensor<4xf32>)
scf.yield %5 : tensor<4xf32>		scf.yield %5 : tensor<4xf32>
}		}

// The following is not a RaW conflict.		// The following is not a RaW conflict.
// CHECK: tensor.extract {{.*}} {__inplace_operands_attr__ = ["true", "none"]}		// CHECK: tensor.extract {{.*}} {__inplace_operands_attr__ = ["true", "none"]}
%6 = tensor.extract %1[%arg4] : tensor<4xf32>		%6 = tensor.extract %1[%arg4] : tensor<4xf32>
vector.print %6 : f32		vector.print %6 : f32
// CHECK: linalg.fill {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: linalg.fill {__inplace_operands_attr__ = ["none", "true"]}
%7 = linalg.fill ins(%cst3 : f32) outs(%1 : tensor<4xf32>) -> tensor<4xf32>		%7 = linalg.fill ins(%cst3 : f32) outs(%1 : tensor<4xf32>)

return %2, %7 : tensor<4xf32>, tensor<4xf32>		return %2, %7 : tensor<4xf32>, tensor<4xf32>
}		}

mlir/test/Dialect/SCF/one-shot-bufferize.mlir

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	func.func @scf_for_is_reading(%A : tensor<?xf32>, %B : tensor<?xf32>,

// This is a regression test to make sure that an alloc + copy is emitted.		// This is a regression test to make sure that an alloc + copy is emitted.

// CHECK: %[[alloc:.*]] = memref.alloc		// CHECK: %[[alloc:.*]] = memref.alloc
// CHECK: memref.copy %[[A]], %[[alloc]]		// CHECK: memref.copy %[[A]], %[[alloc]]
// CHECK: %[[clone:.*]] = bufferization.clone %[[alloc]]		// CHECK: %[[clone:.*]] = bufferization.clone %[[alloc]]
// CHECK: scf.for {{.}} iter_args(%{{.}} = %[[clone]])		// CHECK: scf.for {{.}} iter_args(%{{.}} = %[[clone]])
%0 = scf.for %iv = %lb to %ub step %c1 iter_args(%1 = %A) -> tensor<?xf32> {		%0 = scf.for %iv = %lb to %ub step %c1 iter_args(%1 = %A) -> tensor<?xf32> {
%r = linalg.fill ins(%cst : f32) outs(%1 : tensor<?xf32>) -> tensor<?xf32>		%r = linalg.fill ins(%cst : f32) outs(%1 : tensor<?xf32>)
scf.yield %B : tensor<?xf32>		scf.yield %B : tensor<?xf32>
}		}
%1 = tensor.extract %0[%c1] : tensor<?xf32>		%1 = tensor.extract %0[%c1] : tensor<?xf32>
%2 = tensor.extract %A[%c1] : tensor<?xf32>		%2 = tensor.extract %A[%c1] : tensor<?xf32>
return %1, %2 : f32, f32		return %1, %2 : f32, f32
}		}

// -----		// -----
▲ Show 20 Lines • Show All 476 Lines • ▼ Show 20 Lines	func.func @parallel_insert_slice_no_conflict(
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index		%c1 = arith.constant 1 : index

// CHECK: scf.foreach_thread (%[[tidx:.*]]) in (%[[idx2]])		// CHECK: scf.foreach_thread (%[[tidx:.*]]) in (%[[idx2]])
%2 = scf.foreach_thread (%arg3) in (%idx2) shared_outs(%o = %arg2) -> (tensor<?xf32>) {		%2 = scf.foreach_thread (%arg3) in (%idx2) shared_outs(%o = %arg2) -> (tensor<?xf32>) {
// CHECK: %[[subview:.*]] = memref.subview %[[arg2]][5] [%[[idx]]] [1]		// CHECK: %[[subview:.*]] = memref.subview %[[arg2]][5] [%[[idx]]] [1]
%6 = tensor.extract_slice %o[5] [%idx] [%c1] : tensor<?xf32> to tensor<?xf32>		%6 = tensor.extract_slice %o[5] [%idx] [%c1] : tensor<?xf32> to tensor<?xf32>
// CHECK: linalg.fill ins(%{{.*}}) outs(%[[subview]] : memref<?xf32		// CHECK: linalg.fill ins(%{{.*}}) outs(%[[subview]] : memref<?xf32
%8 = linalg.fill ins(%cst : f32) outs(%6 : tensor<?xf32>) -> tensor<?xf32>		%8 = linalg.fill ins(%cst : f32) outs(%6 : tensor<?xf32>)
// Self-copy will DCE away later.		// Self-copy will DCE away later.
// CHECK: memref.copy %[[subview]], %[[subview]]		// CHECK: memref.copy %[[subview]], %[[subview]]

// Empty terminator is elided from pretty-printing.		// Empty terminator is elided from pretty-printing.
// CHECK-NOT: scf.foreach_thread.perform_concurrently		// CHECK-NOT: scf.foreach_thread.perform_concurrently
// CHECK-NOT: parallel_insert_slice		// CHECK-NOT: parallel_insert_slice
scf.foreach_thread.perform_concurrently {		scf.foreach_thread.perform_concurrently {
tensor.parallel_insert_slice %8 into %o[5] [%idx] [%c1] :		tensor.parallel_insert_slice %8 into %o[5] [%idx] [%c1] :
Show All 30 Lines	func.func @parallel_insert_slice_with_conflict(
// CHECK: memref.copy %[[arg2]], %[[alloc1]]		// CHECK: memref.copy %[[arg2]], %[[alloc1]]

// CHECK: scf.foreach_thread (%[[tidx:.*]]) in (%[[idx2]])		// CHECK: scf.foreach_thread (%[[tidx:.*]]) in (%[[idx2]])
%2 = scf.foreach_thread (%arg3) in (%idx2) shared_outs(%o = %arg2) -> (tensor<?xf32>) {		%2 = scf.foreach_thread (%arg3) in (%idx2) shared_outs(%o = %arg2) -> (tensor<?xf32>) {
// CHECK: %[[subview1:.*]] = memref.subview %[[alloc1]][5] [%[[idx]]] [1]		// CHECK: %[[subview1:.*]] = memref.subview %[[alloc1]][5] [%[[idx]]] [1]
%6 = tensor.extract_slice %o[5] [%idx] [%c1] : tensor<?xf32> to tensor<?xf32>		%6 = tensor.extract_slice %o[5] [%idx] [%c1] : tensor<?xf32> to tensor<?xf32>

// CHECK: linalg.fill ins(%{{.*}}) outs(%[[subview1]] : memref<?xf32		// CHECK: linalg.fill ins(%{{.*}}) outs(%[[subview1]] : memref<?xf32
%8 = linalg.fill ins(%cst : f32) outs(%6 : tensor<?xf32>) -> tensor<?xf32>		%8 = linalg.fill ins(%cst : f32) outs(%6 : tensor<?xf32>)

// Now the copy of the actual insert_slice. (It will fold away.)		// Now the copy of the actual insert_slice. (It will fold away.)
// CHECK: memref.copy %[[subview1]], %[[subview1]]		// CHECK: memref.copy %[[subview1]], %[[subview1]]

// Empty terminator is elided from pretty-printing.		// Empty terminator is elided from pretty-printing.
// CHECK-NOT: scf.foreach_thread.perform_concurrently		// CHECK-NOT: scf.foreach_thread.perform_concurrently
// CHECK-NOT: parallel_insert_slice		// CHECK-NOT: parallel_insert_slice
scf.foreach_thread.perform_concurrently {		scf.foreach_thread.perform_concurrently {
Show All 26 Lines	func.func @matmul(%arg0: tensor<8x8xf32>, %arg1: tensor<8x8xf32>, %arg2: tensor<8x8xf32> {bufferization.writable = true}) -> tensor<8x8xf32> {
%0 = scf.foreach_thread (%arg3, %arg4) in (%c2, %c4) shared_outs(%o = %arg2) -> (tensor<8x8xf32>) {		%0 = scf.foreach_thread (%arg3, %arg4) in (%c2, %c4) shared_outs(%o = %arg2) -> (tensor<8x8xf32>) {
%1 = affine.apply #map0(%arg3)		%1 = affine.apply #map0(%arg3)
%3 = tensor.extract_slice %arg0[%1, 0] [4, 8] [1, 1] : tensor<8x8xf32> to tensor<4x8xf32>		%3 = tensor.extract_slice %arg0[%1, 0] [4, 8] [1, 1] : tensor<8x8xf32> to tensor<4x8xf32>
%4 = affine.apply #map1(%arg4)		%4 = affine.apply #map1(%arg4)
%6 = tensor.extract_slice %arg1[0, %4] [8, 4] [1, 1] : tensor<8x8xf32> to tensor<8x4xf32>		%6 = tensor.extract_slice %arg1[0, %4] [8, 4] [1, 1] : tensor<8x8xf32> to tensor<8x4xf32>
%7 = tensor.extract_slice %o[%1, %4] [4, 4] [1, 1] : tensor<8x8xf32> to tensor<4x4xf32>		%7 = tensor.extract_slice %o[%1, %4] [4, 4] [1, 1] : tensor<8x8xf32> to tensor<4x4xf32>

// CHECK: linalg.matmul ins({{.}}memref<4x8xf32, strided<[?, ?], offset: ?>>, memref<8x4xf32, strided<[?, ?], offset: ?>>) outs({{.}} : memref<4x4xf32, strided<[?, ?], offset: ?>>)		// CHECK: linalg.matmul ins({{.}}memref<4x8xf32, strided<[?, ?], offset: ?>>, memref<8x4xf32, strided<[?, ?], offset: ?>>) outs({{.}} : memref<4x4xf32, strided<[?, ?], offset: ?>>)
%8 = linalg.matmul ins(%3, %6 : tensor<4x8xf32>, tensor<8x4xf32>) outs(%7 : tensor<4x4xf32>) -> tensor<4x4xf32>		%8 = linalg.matmul ins(%3, %6 : tensor<4x8xf32>, tensor<8x4xf32>) outs(%7 : tensor<4x4xf32>)
scf.foreach_thread.perform_concurrently {		scf.foreach_thread.perform_concurrently {
tensor.parallel_insert_slice %8 into %o[%1, %4] [4, 4] [1, 1] : tensor<4x4xf32> into tensor<8x8xf32>		tensor.parallel_insert_slice %8 into %o[%1, %4] [4, 4] [1, 1] : tensor<4x4xf32> into tensor<8x8xf32>
}		}
}		}
return %0 : tensor<8x8xf32>		return %0 : tensor<8x8xf32>
}		}

// -----		// -----
▲ Show 20 Lines • Show All 279 Lines • Show Last 20 Lines

mlir/test/Dialect/SparseTensor/sparse_expand.mlir

	Show First 20 Lines • Show All 118 Lines • ▼ Show 20 Lines
	// CHECK-CONVERT: memref.dealloc %[[C]] : memref<?xindex>			// CHECK-CONVERT: memref.dealloc %[[C]] : memref<?xindex>
	// CHECK-CONVERT: call @endInsert			// CHECK-CONVERT: call @endInsert
	//			//
	func.func @matmul1(%A: tensor<8x2xf64, #CSR>,			func.func @matmul1(%A: tensor<8x2xf64, #CSR>,
	%B: tensor<2x4xf64, #CSR>) -> tensor<8x4xf64, #CSR> {			%B: tensor<2x4xf64, #CSR>) -> tensor<8x4xf64, #CSR> {
	%C = bufferization.alloc_tensor() : tensor<8x4xf64, #CSR>			%C = bufferization.alloc_tensor() : tensor<8x4xf64, #CSR>
	%D = linalg.matmul			%D = linalg.matmul
	ins(%A, %B: tensor<8x2xf64, #CSR>, tensor<2x4xf64, #CSR>)			ins(%A, %B: tensor<8x2xf64, #CSR>, tensor<2x4xf64, #CSR>)
	outs(%C: tensor<8x4xf64, #CSR>) -> tensor<8x4xf64, #CSR>			outs(%C: tensor<8x4xf64, #CSR>)
	return %D: tensor<8x4xf64, #CSR>			return %D: tensor<8x4xf64, #CSR>
	}			}

	//			//
	// CHECK-SPARSE-LABEL: func @matmul2(			// CHECK-SPARSE-LABEL: func @matmul2(
	// CHECK-SPARSE-DAG: %[[C0:.*]] = arith.constant 0 : index			// CHECK-SPARSE-DAG: %[[C0:.*]] = arith.constant 0 : index
	// CHECK-SPARSE-DAG: %[[C1:.*]] = arith.constant 1 : index			// CHECK-SPARSE-DAG: %[[C1:.*]] = arith.constant 1 : index
	// CHECK-SPARSE-DAG: %[[C4:.*]] = arith.constant 4 : index			// CHECK-SPARSE-DAG: %[[C4:.*]] = arith.constant 4 : index
	Show All 31 Lines
	// CHECK-CONVERT: memref.dealloc %[[C]] : memref<?xindex>			// CHECK-CONVERT: memref.dealloc %[[C]] : memref<?xindex>
	// CHECK-CONVERT: call @endInsert			// CHECK-CONVERT: call @endInsert
	//			//
	func.func @matmul2(%A: tensor<8x2xf64, #CSC>,			func.func @matmul2(%A: tensor<8x2xf64, #CSC>,
	%B: tensor<2x4xf64, #CSC>) -> tensor<8x4xf64, #CSC> {			%B: tensor<2x4xf64, #CSC>) -> tensor<8x4xf64, #CSC> {
	%C = bufferization.alloc_tensor() : tensor<8x4xf64, #CSC>			%C = bufferization.alloc_tensor() : tensor<8x4xf64, #CSC>
	%D = linalg.matmul			%D = linalg.matmul
	ins(%A, %B: tensor<8x2xf64, #CSC>, tensor<2x4xf64, #CSC>)			ins(%A, %B: tensor<8x2xf64, #CSC>, tensor<2x4xf64, #CSC>)
	outs(%C: tensor<8x4xf64, #CSC>) -> tensor<8x4xf64, #CSC>			outs(%C: tensor<8x4xf64, #CSC>)
	return %D: tensor<8x4xf64, #CSC>			return %D: tensor<8x4xf64, #CSC>
	}			}

mlir/test/Dialect/SparseTensor/sparse_fill_zero.mlir

	Show First 20 Lines • Show All 118 Lines • ▼ Show 20 Lines
	// CHECK: call @endInsert(%[[VAL_19]]) : (!llvm.ptr<i8>) -> ()			// CHECK: call @endInsert(%[[VAL_19]]) : (!llvm.ptr<i8>) -> ()
	// CHECK: return %[[VAL_19]] : !llvm.ptr<i8>			// CHECK: return %[[VAL_19]] : !llvm.ptr<i8>
	// CHECK: }			// CHECK: }
	func.func @fill_zero_after_alloc(%arg0: tensor<100x200xf64, #DCSR>,			func.func @fill_zero_after_alloc(%arg0: tensor<100x200xf64, #DCSR>,
	%arg1: tensor<200x300xf64, #DCSR>) -> tensor<100x300xf64, #DCSR> {			%arg1: tensor<200x300xf64, #DCSR>) -> tensor<100x300xf64, #DCSR> {
	%0 = bufferization.alloc_tensor() : tensor<100x300xf64, #DCSR>			%0 = bufferization.alloc_tensor() : tensor<100x300xf64, #DCSR>
	%cst = arith.constant 0.000000e+00 : f64			%cst = arith.constant 0.000000e+00 : f64
	%1 = linalg.fill ins(%cst : f64)			%1 = linalg.fill ins(%cst : f64)
	outs(%0 : tensor<100x300xf64, #DCSR>) -> tensor<100x300xf64, #DCSR>			outs(%0 : tensor<100x300xf64, #DCSR>)
	%2 = linalg.matmul ins(%arg0, %arg1 : tensor<100x200xf64, #DCSR>, tensor<200x300xf64, #DCSR>)			%2 = linalg.matmul ins(%arg0, %arg1 : tensor<100x200xf64, #DCSR>, tensor<200x300xf64, #DCSR>)
	outs(%1 : tensor<100x300xf64, #DCSR>) -> tensor<100x300xf64, #DCSR>			outs(%1 : tensor<100x300xf64, #DCSR>)
	return %2 : tensor<100x300xf64, #DCSR>			return %2 : tensor<100x300xf64, #DCSR>
	}			}

mlir/test/Dialect/SparseTensor/sparse_kernels.mlir

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	// CHECK: %[[VAL_28:.*]] = bufferization.to_tensor %[[VAL_12]] : memref<10x30xf32>			// CHECK: %[[VAL_28:.*]] = bufferization.to_tensor %[[VAL_12]] : memref<10x30xf32>
	// CHECK: return %[[VAL_28]] : tensor<10x30xf32>			// CHECK: return %[[VAL_28]] : tensor<10x30xf32>
	// CHECK: }			// CHECK: }
	func.func @matmul1(%a: tensor<10x20xf32, #DCSR>,			func.func @matmul1(%a: tensor<10x20xf32, #DCSR>,
	%b: tensor<20x30xf32>,			%b: tensor<20x30xf32>,
	%c: tensor<10x30xf32>) -> tensor<10x30xf32> {			%c: tensor<10x30xf32>) -> tensor<10x30xf32> {
	%0 = linalg.matmul			%0 = linalg.matmul
	ins(%a, %b: tensor<10x20xf32, #DCSR>, tensor<20x30xf32>)			ins(%a, %b: tensor<10x20xf32, #DCSR>, tensor<20x30xf32>)
	outs(%c: tensor<10x30xf32>) -> tensor<10x30xf32>			outs(%c: tensor<10x30xf32>)
	return %0 : tensor<10x30xf32>			return %0 : tensor<10x30xf32>
	}			}

	//			//
	// Computes C = A x B with all matrices sparse (SpMSpM) in DCSR.			// Computes C = A x B with all matrices sparse (SpMSpM) in DCSR.
	//			//
	// CHECK-LABEL: func.func @matmul2(			// CHECK-LABEL: func.func @matmul2(
	// CHECK-SAME: %[[VAL_0:.*]]: tensor<4x8xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>,			// CHECK-SAME: %[[VAL_0:.*]]: tensor<4x8xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>,
	▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	// CHECK: return %[[VAL_79]] : tensor<4x4xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>			// CHECK: return %[[VAL_79]] : tensor<4x4xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>
	// CHECK: }			// CHECK: }
	func.func @matmul2(%A: tensor<4x8xf64, #DCSR>,			func.func @matmul2(%A: tensor<4x8xf64, #DCSR>,
	%B: tensor<8x4xf64, #DCSR>) -> tensor<4x4xf64, #DCSR> {			%B: tensor<8x4xf64, #DCSR>) -> tensor<4x4xf64, #DCSR> {
	%c4 = arith.constant 4 : index			%c4 = arith.constant 4 : index
	%C = bufferization.alloc_tensor() : tensor<4x4xf64, #DCSR>			%C = bufferization.alloc_tensor() : tensor<4x4xf64, #DCSR>
	%D = linalg.matmul			%D = linalg.matmul
	ins(%A, %B: tensor<4x8xf64, #DCSR>, tensor<8x4xf64, #DCSR>)			ins(%A, %B: tensor<4x8xf64, #DCSR>, tensor<8x4xf64, #DCSR>)
	outs(%C: tensor<4x4xf64, #DCSR>) -> tensor<4x4xf64, #DCSR>			outs(%C: tensor<4x4xf64, #DCSR>)
	return %D: tensor<4x4xf64, #DCSR>			return %D: tensor<4x4xf64, #DCSR>
	}			}

	// CHECK-LABEL: func.func @conv2d(			// CHECK-LABEL: func.func @conv2d(
	// CHECK-SAME: %[[VAL_0:.*]]: tensor<8x8xi32>,			// CHECK-SAME: %[[VAL_0:.*]]: tensor<8x8xi32>,
	// CHECK-SAME: %[[VAL_1:.*]]: tensor<3x3xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>,			// CHECK-SAME: %[[VAL_1:.*]]: tensor<3x3xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>,
	// CHECK-SAME: %[[VAL_2:.*]]: tensor<6x6xi32>) -> tensor<6x6xi32> {			// CHECK-SAME: %[[VAL_2:.*]]: tensor<6x6xi32>) -> tensor<6x6xi32> {
	// CHECK-DAG: %[[VAL_3:.*]] = arith.constant 6 : index			// CHECK-DAG: %[[VAL_3:.*]] = arith.constant 6 : index
	Show All 32 Lines
	// CHECK: %[[VAL_31:.*]] = bufferization.to_tensor %[[VAL_12]] : memref<6x6xi32>			// CHECK: %[[VAL_31:.*]] = bufferization.to_tensor %[[VAL_12]] : memref<6x6xi32>
	// CHECK: return %[[VAL_31]] : tensor<6x6xi32>			// CHECK: return %[[VAL_31]] : tensor<6x6xi32>
	// CHECK: }			// CHECK: }
	func.func @conv2d(%input: tensor<8x8xi32>,			func.func @conv2d(%input: tensor<8x8xi32>,
	%filter: tensor<3x3xi32, #DCSR>,			%filter: tensor<3x3xi32, #DCSR>,
	%output: tensor<6x6xi32>) -> tensor<6x6xi32> {			%output: tensor<6x6xi32>) -> tensor<6x6xi32> {
	%0 = linalg.conv_2d			%0 = linalg.conv_2d
	ins (%input, %filter: tensor<8x8xi32>, tensor<3x3xi32, #DCSR>)			ins (%input, %filter: tensor<8x8xi32>, tensor<3x3xi32, #DCSR>)
	outs (%output: tensor<6x6xi32>) -> tensor<6x6xi32>			outs (%output: tensor<6x6xi32>)
	return %0 : tensor<6x6xi32>			return %0 : tensor<6x6xi32>
	}			}

	// CHECK-LABEL: func.func @quantized_matmul(			// CHECK-LABEL: func.func @quantized_matmul(
	// CHECK-SAME: %[[VAL_0:.*]]: tensor<5x3xi8>,			// CHECK-SAME: %[[VAL_0:.*]]: tensor<5x3xi8>,
	// CHECK-SAME: %[[VAL_1:.*]]: tensor<3x6xi8, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>,			// CHECK-SAME: %[[VAL_1:.*]]: tensor<3x6xi8, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>,
	// CHECK-SAME: %[[VAL_2:.*]]: tensor<5x6xi64>) -> tensor<5x6xi64> {			// CHECK-SAME: %[[VAL_2:.*]]: tensor<5x6xi64>) -> tensor<5x6xi64> {
	// CHECK-DAG: %[[VAL_3:.*]] = arith.constant 5 : index			// CHECK-DAG: %[[VAL_3:.*]] = arith.constant 5 : index
	Show All 34 Lines
	// CHECK: }			// CHECK: }
	func.func @quantized_matmul(%input1: tensor<5x3xi8>,			func.func @quantized_matmul(%input1: tensor<5x3xi8>,
	%input2: tensor<3x6xi8, #DCSR>,			%input2: tensor<3x6xi8, #DCSR>,
	%output: tensor<5x6xi64>) -> tensor<5x6xi64> {			%output: tensor<5x6xi64>) -> tensor<5x6xi64> {
	%c0 = arith.constant 0 : i32			%c0 = arith.constant 0 : i32
	%c2 = arith.constant 2 : i32			%c2 = arith.constant 2 : i32
	%0 = linalg.quantized_matmul			%0 = linalg.quantized_matmul
	ins(%input1, %input2, %c2, %c0 : tensor<5x3xi8>, tensor<3x6xi8, #DCSR>, i32, i32)			ins(%input1, %input2, %c2, %c0 : tensor<5x3xi8>, tensor<3x6xi8, #DCSR>, i32, i32)
	outs(%output : tensor<5x6xi64>) -> tensor<5x6xi64>			outs(%output : tensor<5x6xi64>)
	return %0: tensor<5x6xi64>			return %0: tensor<5x6xi64>
	}			}

	// CHECK-LABEL: func.func @sparse_dot(			// CHECK-LABEL: func.func @sparse_dot(
	// CHECK-SAME: %[[VAL_0:.*0]]: tensor<1024xf32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed" ] }>>,			// CHECK-SAME: %[[VAL_0:.*0]]: tensor<1024xf32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed" ] }>>,
	// CHECK-SAME: %[[VAL_1:.*1]]: tensor<1024xf32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed" ] }>>,			// CHECK-SAME: %[[VAL_1:.*1]]: tensor<1024xf32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed" ] }>>,
	// CHECK-SAME: %[[VAL_2:.*2]]: tensor<f32>) -> tensor<f32> {			// CHECK-SAME: %[[VAL_2:.*2]]: tensor<f32>) -> tensor<f32> {
	// CHECK-DAG: %[[VAL_3:.*]] = arith.constant 0 : index			// CHECK-DAG: %[[VAL_3:.*]] = arith.constant 0 : index
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	// CHECK: %[[VAL_47:.*]] = bufferization.to_tensor %[[VAL_11]] : memref<f32>			// CHECK: %[[VAL_47:.*]] = bufferization.to_tensor %[[VAL_11]] : memref<f32>
	// CHECK: return %[[VAL_47]] : tensor<f32>			// CHECK: return %[[VAL_47]] : tensor<f32>
	// CHECK: }			// CHECK: }
	func.func @sparse_dot(%a: tensor<1024xf32, #SparseVector>,			func.func @sparse_dot(%a: tensor<1024xf32, #SparseVector>,
	%b: tensor<1024xf32, #SparseVector>,			%b: tensor<1024xf32, #SparseVector>,
	%x: tensor<f32>) -> tensor<f32> {			%x: tensor<f32>) -> tensor<f32> {
	%dot = linalg.dot ins(%a, %b: tensor<1024xf32, #SparseVector>,			%dot = linalg.dot ins(%a, %b: tensor<1024xf32, #SparseVector>,
	tensor<1024xf32, #SparseVector>)			tensor<1024xf32, #SparseVector>)
	outs(%x: tensor<f32>) -> tensor<f32>			outs(%x: tensor<f32>)
	return %dot : tensor<f32>			return %dot : tensor<f32>
	}			}

mlir/test/Dialect/SparseTensor/sparse_matmul_codegen.mlir

	Show First 20 Lines • Show All 151 Lines • ▼ Show 20 Lines
	// CHECK: scf.yield %[[VAL_91]] : index			// CHECK: scf.yield %[[VAL_91]] : index
	// CHECK: }			// CHECK: }
	// CHECK: return %[[VAL_83]]#0, %[[VAL_83]]#1, %[[VAL_83]]#2, %[[VAL_83]]#3 : memref<?xindex>, memref<?xindex>, memref<?xf64>, !sparse_tensor.storage_specifier			// CHECK: return %[[VAL_83]]#0, %[[VAL_83]]#1, %[[VAL_83]]#2, %[[VAL_83]]#3 : memref<?xindex>, memref<?xindex>, memref<?xf64>, !sparse_tensor.storage_specifier
	func.func @matmul(%A: tensor<4x8xf64, #CSR>,			func.func @matmul(%A: tensor<4x8xf64, #CSR>,
	%B: tensor<8x4xf64, #CSR>) -> tensor<4x4xf64, #CSR> {			%B: tensor<8x4xf64, #CSR>) -> tensor<4x4xf64, #CSR> {
	%C = bufferization.alloc_tensor() : tensor<4x4xf64, #CSR>			%C = bufferization.alloc_tensor() : tensor<4x4xf64, #CSR>
	%D = linalg.matmul			%D = linalg.matmul
	ins(%A, %B: tensor<4x8xf64, #CSR>, tensor<8x4xf64, #CSR>)			ins(%A, %B: tensor<4x8xf64, #CSR>, tensor<8x4xf64, #CSR>)
	outs(%C: tensor<4x4xf64, #CSR>) -> tensor<4x4xf64, #CSR>			outs(%C: tensor<4x4xf64, #CSR>)
	return %D: tensor<4x4xf64, #CSR>			return %D: tensor<4x4xf64, #CSR>
	}			}

mlir/test/Dialect/Tensor/one-shot-bufferize.mlir

Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	func.func @insert_slice_fun(

// CHECK-NOT: alloc		// CHECK-NOT: alloc
// CHECK: %[[SV_A:.*]] = memref.subview %[[A]]		// CHECK: %[[SV_A:.*]] = memref.subview %[[A]]
// CHECK: memref.copy %[[t]], %[[SV_A]]		// CHECK: memref.copy %[[t]], %[[SV_A]]
%r0 = tensor.insert_slice %t into %A[0][4][1] : tensor<4xf32> into tensor<?xf32>		%r0 = tensor.insert_slice %t into %A[0][4][1] : tensor<4xf32> into tensor<?xf32>

/// Overwrite A inplace.		/// Overwrite A inplace.
// CHECK: linalg.fill ins({{.}}{{.}}outs(%[[A]]		// CHECK: linalg.fill ins({{.}}{{.}}outs(%[[A]]
%r1 = linalg.fill ins(%f0 : f32) outs(%r0 : tensor<?xf32>) -> tensor<?xf32>		%r1 = linalg.fill ins(%f0 : f32) outs(%r0 : tensor<?xf32>)

// CHECK: return		// CHECK: return
// CHECK-NOT: tensor		// CHECK-NOT: tensor
return %r1: tensor<?xf32>		return %r1: tensor<?xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @insert_slice_fun		// CHECK-LABEL: func @insert_slice_fun
// CHECK-SAME: %[[A:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?], offset: ?>>		// CHECK-SAME: %[[A:[a-zA-Z0-9]*]]: memref<?xf32, strided<[?], offset: ?>>
// CHECK-SAME: %[[t:[a-zA-Z0-9]*]]: memref<4xf32, strided<[?], offset: ?>>		// CHECK-SAME: %[[t:[a-zA-Z0-9]*]]: memref<4xf32, strided<[?], offset: ?>>
func.func @insert_slice_fun(		func.func @insert_slice_fun(
%A : tensor<?xf32> {bufferization.writable = true},		%A : tensor<?xf32> {bufferization.writable = true},
%t : tensor<4xf32> {bufferization.writable = false})		%t : tensor<4xf32> {bufferization.writable = false})
-> tensor<?xf32>		-> tensor<?xf32>
{		{
%f0 = arith.constant 0.0 : f32		%f0 = arith.constant 0.0 : f32

// CHECK: linalg.fill ins({{.}}{{.}}outs(%[[A]]		// CHECK: linalg.fill ins({{.}}{{.}}outs(%[[A]]
%r0 = linalg.fill ins(%f0 : f32) outs(%A : tensor<?xf32>) -> tensor<?xf32>		%r0 = linalg.fill ins(%f0 : f32) outs(%A : tensor<?xf32>)

// CHECK-NOT: alloc		// CHECK-NOT: alloc
// CHECK: %[[SV_A:.*]] = memref.subview %[[A]]		// CHECK: %[[SV_A:.*]] = memref.subview %[[A]]
/// Overwrite A inplace by copying into the subview.		/// Overwrite A inplace by copying into the subview.
// CHECK: memref.copy %[[t]], %[[SV_A]]		// CHECK: memref.copy %[[t]], %[[SV_A]]
%r1 = tensor.insert_slice %t into %r0[0][4][1] : tensor<4xf32> into tensor<?xf32>		%r1 = tensor.insert_slice %t into %r0[0][4][1] : tensor<4xf32> into tensor<?xf32>

// CHECK: return		// CHECK: return
▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines
// This test case could bufferize in-place with a better analysis. However, it		// This test case could bufferize in-place with a better analysis. However, it
// is simpler to let the canonicalizer fold away the tensor.insert_slice.		// is simpler to let the canonicalizer fold away the tensor.insert_slice.

// CHECK-LABEL: func @insert_equivalent_tensor		// CHECK-LABEL: func @insert_equivalent_tensor
func.func @insert_equivalent_tensor(%t: tensor<10xf32>) -> tensor<10xf32> {		func.func @insert_equivalent_tensor(%t: tensor<10xf32>) -> tensor<10xf32> {
// CHECK: memref.alloc		// CHECK: memref.alloc
%cst = arith.constant 4.200000e+01 : f32		%cst = arith.constant 4.200000e+01 : f32
// CHECK: linalg.fill		// CHECK: linalg.fill
%0 = linalg.fill ins(%cst : f32) outs(%t : tensor<10xf32>) -> tensor<10xf32>		%0 = linalg.fill ins(%cst : f32) outs(%t : tensor<10xf32>)
// CHECK: memref.copy		// CHECK: memref.copy
%1 = tensor.insert_slice %0 into %t[0][10][1] : tensor<10xf32> into tensor<10xf32>		%1 = tensor.insert_slice %0 into %t[0][10][1] : tensor<10xf32> into tensor<10xf32>
return %1 : tensor<10xf32>		return %1 : tensor<10xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @pad_memory_space(		// CHECK-LABEL: func @pad_memory_space(
Show All 26 Lines

// CHECK-LABEL: func @insert_slice_regression(		// CHECK-LABEL: func @insert_slice_regression(
// CHECK-SAME: %[[t:.]]: memref<10xf32,{{.}}>, %[[b:.*]]: memref<5xf32		// CHECK-SAME: %[[t:.]]: memref<10xf32,{{.}}>, %[[b:.*]]: memref<5xf32
func.func @insert_slice_regression(%t: tensor<10xf32>, %b: tensor<5xf32>) -> tensor<10xf32> {		func.func @insert_slice_regression(%t: tensor<10xf32>, %b: tensor<5xf32>) -> tensor<10xf32> {
%cst = arith.constant 0.0 : f32		%cst = arith.constant 0.0 : f32
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
// CHECK: %[[alloc:.]] = memref.alloc() {{.}} : memref<10xf32>		// CHECK: %[[alloc:.]] = memref.alloc() {{.}} : memref<10xf32>
// CHECK: linalg.fill {{.*}} outs(%[[alloc]] : memref<10xf32>)		// CHECK: linalg.fill {{.*}} outs(%[[alloc]] : memref<10xf32>)
%1 = linalg.fill ins(%cst : f32) outs(%t : tensor<10xf32>) -> tensor<10xf32>		%1 = linalg.fill ins(%cst : f32) outs(%t : tensor<10xf32>)

// Read %1 so that it does not DCE away.		// Read %1 so that it does not DCE away.
%vec = vector.transfer_read %1[%c0], %cst : tensor<10xf32>, vector<10xf32>		%vec = vector.transfer_read %1[%c0], %cst : tensor<10xf32>, vector<10xf32>
vector.print %vec : vector<10xf32>		vector.print %vec : vector<10xf32>

// Write back a different value (not %1).		// Write back a different value (not %1).
// CHECK: %[[subview:.*]] = memref.subview %[[t]][0] [5] [1]		// CHECK: %[[subview:.*]] = memref.subview %[[t]][0] [5] [1]
// CHECK: memref.copy %[[b]], %[[subview]]		// CHECK: memref.copy %[[b]], %[[subview]]
%2 = tensor.insert_slice %b into %t[0][5][1] : tensor<5xf32> into tensor<10xf32>		%2 = tensor.insert_slice %b into %t[0][5][1] : tensor<5xf32> into tensor<10xf32>
return %2 : tensor<10xf32>		return %2 : tensor<10xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @insert_slice_full_overwrite(		// CHECK-LABEL: func @insert_slice_full_overwrite(
// CHECK-SAME: %[[t:.]]: memref<10xf32,{{.}}>, %[[b:.]]: memref<10xf32,{{.}}>		// CHECK-SAME: %[[t:.]]: memref<10xf32,{{.}}>, %[[b:.]]: memref<10xf32,{{.}}>
func.func @insert_slice_full_overwrite(%t: tensor<10xf32>, %b: tensor<10xf32>) -> tensor<10xf32> {		func.func @insert_slice_full_overwrite(%t: tensor<10xf32>, %b: tensor<10xf32>) -> tensor<10xf32> {
%cst = arith.constant 0.0 : f32		%cst = arith.constant 0.0 : f32
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
// CHECK: linalg.fill {{.}} outs(%[[t]] : memref<10xf32,{{.}}>)		// CHECK: linalg.fill {{.}} outs(%[[t]] : memref<10xf32,{{.}}>)
%1 = linalg.fill ins(%cst : f32) outs(%t : tensor<10xf32>) -> tensor<10xf32>		%1 = linalg.fill ins(%cst : f32) outs(%t : tensor<10xf32>)

// Read %1 so that it does not DCE away.		// Read %1 so that it does not DCE away.
%vec = vector.transfer_read %1[%c0], %cst : tensor<10xf32>, vector<10xf32>		%vec = vector.transfer_read %1[%c0], %cst : tensor<10xf32>, vector<10xf32>
vector.print %vec : vector<10xf32>		vector.print %vec : vector<10xf32>

// Write back a different value (not %1).		// Write back a different value (not %1).
// CHECK: memref.copy %[[b]], %[[t]]		// CHECK: memref.copy %[[b]], %[[t]]
%2 = tensor.insert_slice %b into %t[0][10][1] : tensor<10xf32> into tensor<10xf32>		%2 = tensor.insert_slice %b into %t[0][10][1] : tensor<10xf32> into tensor<10xf32>
Show All 19 Lines

mlir/test/Dialect/Transform/selective-targeting.mlir

	// RUN: mlir-opt %s -test-transform-dialect-interpreter --split-input-file \| FileCheck %s			// RUN: mlir-opt %s -test-transform-dialect-interpreter --split-input-file \| FileCheck %s

	// CHECK-LABEL: func.func @matmul_tensors_1(			// CHECK-LABEL: func.func @matmul_tensors_1(
	func.func @matmul_tensors_1(			func.func @matmul_tensors_1(
	%arg0: tensor<128x128xf32>, %arg1: tensor<128x128xf32>,			%arg0: tensor<128x128xf32>, %arg1: tensor<128x128xf32>,
	%arg2: tensor<128x128xf32>)			%arg2: tensor<128x128xf32>)
	-> tensor<128x128xf32> {			-> tensor<128x128xf32> {
	// This operation is marked for tiling only.			// This operation is marked for tiling only.
	// CHECK-COUNT-3: scf.for			// CHECK-COUNT-3: scf.for
	// CHECK-COUNT-3: tensor.extract_slice			// CHECK-COUNT-3: tensor.extract_slice
	// CHECK: linalg.matmul			// CHECK: linalg.matmul
	// CHECK-SAME: -> tensor<4x4xf32>			// CHECK-SAME: tensor<4x4xf32>
	%0 = linalg.matmul { test.attrA }			%0 = linalg.matmul { test.attrA }
	ins(%arg0, %arg1: tensor<128x128xf32>, tensor<128x128xf32>)			ins(%arg0, %arg1: tensor<128x128xf32>, tensor<128x128xf32>)
	outs(%arg2: tensor<128x128xf32>)			outs(%arg2: tensor<128x128xf32>)
	-> tensor<128x128xf32>
	func.return %0 : tensor<128x128xf32>			func.return %0 : tensor<128x128xf32>
	}			}

	func.func @matmul_tensors_2(			func.func @matmul_tensors_2(
	%arg0: tensor<128x128xf32>, %arg1: tensor<128x128xf32>,			%arg0: tensor<128x128xf32>, %arg1: tensor<128x128xf32>,
	%arg2: tensor<128x128xf32>)			%arg2: tensor<128x128xf32>)
	-> tensor<128x128xf32> {			-> tensor<128x128xf32> {
	// This operation is marked f			// This operation is marked f
	// This operation is marked for tiling and vectorization.			// This operation is marked for tiling and vectorization.
	// CHECK-COUNT-3: scf.for			// CHECK-COUNT-3: scf.for
	// CHECK-COUNT-3: vector.transfer_read			// CHECK-COUNT-3: vector.transfer_read
	// CHECK: vector.contract			// CHECK: vector.contract
	// CHECK-NOT: linalg.matmul			// CHECK-NOT: linalg.matmul
	// CHECK: vector.transfer_write			// CHECK: vector.transfer_write
	%0 = linalg.matmul { test.attrA, test.attrC }			%0 = linalg.matmul { test.attrA, test.attrC }
	ins(%arg0, %arg1: tensor<128x128xf32>, tensor<128x128xf32>)			ins(%arg0, %arg1: tensor<128x128xf32>, tensor<128x128xf32>)
	outs(%arg2: tensor<128x128xf32>)			outs(%arg2: tensor<128x128xf32>)
	-> tensor<128x128xf32>
	func.return %0 : tensor<128x128xf32>			func.return %0 : tensor<128x128xf32>
	}			}

	func.func @matmul_tensors_3(			func.func @matmul_tensors_3(
	%arg0: tensor<128x128xf32>, %arg1: tensor<128x128xf32>,			%arg0: tensor<128x128xf32>, %arg1: tensor<128x128xf32>,
	%arg2: tensor<128x128xf32>)			%arg2: tensor<128x128xf32>)
	-> tensor<128x128xf32> {			-> tensor<128x128xf32> {
	// This operation is marked for vectorization only.			// This operation is marked for vectorization only.
	// CHECK-NOT: scf.for			// CHECK-NOT: scf.for
	// CHECK-COUNT-3: vector.transfer_read			// CHECK-COUNT-3: vector.transfer_read
	// CHECK: vector.contract			// CHECK: vector.contract
	// CHECK-SAME: into vector<128x128xf32>			// CHECK-SAME: into vector<128x128xf32>
	// CHECK: vector.transfer_write			// CHECK: vector.transfer_write
	%0 = linalg.matmul { test.attrC }			%0 = linalg.matmul { test.attrC }
	ins(%arg0, %arg1: tensor<128x128xf32>, tensor<128x128xf32>)			ins(%arg0, %arg1: tensor<128x128xf32>, tensor<128x128xf32>)
	outs(%arg2: tensor<128x128xf32>)			outs(%arg2: tensor<128x128xf32>)
	-> tensor<128x128xf32>
	func.return %0 : tensor<128x128xf32>			func.return %0 : tensor<128x128xf32>
	}			}

	transform.with_pdl_patterns {			transform.with_pdl_patterns {
	^bb0(%arg0: !pdl.operation):			^bb0(%arg0: !pdl.operation):
	// Match matmul operations inside @matmul_tensors with test.attrA set.			// Match matmul operations inside @matmul_tensors with test.attrA set.
	pdl.pattern @pdl_target_attrA : benefit(1) {			pdl.pattern @pdl_target_attrA : benefit(1) {
	%args = operands			%args = operands
	Show All 30 Lines
	func.func @vectorize_one(			func.func @vectorize_one(
	%arg0: tensor<128x128xf32>, %arg1: tensor<128x128xf32>,			%arg0: tensor<128x128xf32>, %arg1: tensor<128x128xf32>,
	%arg2: tensor<128x128xf32>)			%arg2: tensor<128x128xf32>)
	-> tensor<128x128xf32> {			-> tensor<128x128xf32> {
	// CHECK: vector.contract			// CHECK: vector.contract
	%0 = linalg.matmul {test.attrA}			%0 = linalg.matmul {test.attrA}
	ins(%arg0, %arg1: tensor<128x128xf32>, tensor<128x128xf32>)			ins(%arg0, %arg1: tensor<128x128xf32>, tensor<128x128xf32>)
	outs(%arg2: tensor<128x128xf32>)			outs(%arg2: tensor<128x128xf32>)
	-> tensor<128x128xf32>
	func.return %0 : tensor<128x128xf32>			func.return %0 : tensor<128x128xf32>
	}			}

	func.func @vectorize_none(			func.func @vectorize_none(
	%arg0: tensor<128x128xf32>, %arg1: tensor<128x128xf32>,			%arg0: tensor<128x128xf32>, %arg1: tensor<128x128xf32>,
	%arg2: tensor<128x128xf32>)			%arg2: tensor<128x128xf32>)
	-> tensor<128x128xf32> {			-> tensor<128x128xf32> {
	// CHECK: linalg.matmul			// CHECK: linalg.matmul
	%0 = linalg.matmul ins(%arg0, %arg1: tensor<128x128xf32>, tensor<128x128xf32>)			%0 = linalg.matmul ins(%arg0, %arg1: tensor<128x128xf32>, tensor<128x128xf32>)
	outs(%arg2: tensor<128x128xf32>)			outs(%arg2: tensor<128x128xf32>)
	-> tensor<128x128xf32>
	func.return %0 : tensor<128x128xf32>			func.return %0 : tensor<128x128xf32>
	}			}

	transform.with_pdl_patterns {			transform.with_pdl_patterns {
	^bb0(%arg0: !pdl.operation):			^bb0(%arg0: !pdl.operation):
	pdl.pattern @pdl_target : benefit(1) {			pdl.pattern @pdl_target : benefit(1) {
	%args = operands			%args = operands
	%results = types			%results = types
	Show All 17 Lines
	func.func @vectorize_all(			func.func @vectorize_all(
	%arg0: tensor<128x128xf32>, %arg1: tensor<128x128xf32>, %arg2: tensor<128x128xf32>,			%arg0: tensor<128x128xf32>, %arg1: tensor<128x128xf32>, %arg2: tensor<128x128xf32>,
	%arg3: tensor<128x128xf32>)			%arg3: tensor<128x128xf32>)
	-> tensor<128x128xf32> {			-> tensor<128x128xf32> {
	// CHECK: vector.contract			// CHECK: vector.contract
	%0 = linalg.matmul {test.attrA}			%0 = linalg.matmul {test.attrA}
	ins(%arg0, %arg1: tensor<128x128xf32>, tensor<128x128xf32>)			ins(%arg0, %arg1: tensor<128x128xf32>, tensor<128x128xf32>)
	outs(%arg2: tensor<128x128xf32>)			outs(%arg2: tensor<128x128xf32>)
	-> tensor<128x128xf32>
	// CHECK: vector.contract			// CHECK: vector.contract
	%1 = linalg.matmul ins(%arg0, %0: tensor<128x128xf32>, tensor<128x128xf32>)			%1 = linalg.matmul ins(%arg0, %0: tensor<128x128xf32>, tensor<128x128xf32>)
	outs(%arg3: tensor<128x128xf32>)			outs(%arg3: tensor<128x128xf32>)
	-> tensor<128x128xf32>
	return %1 : tensor<128x128xf32>			return %1 : tensor<128x128xf32>
	}			}

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb0(%arg0: !pdl.operation):			^bb0(%arg0: !pdl.operation):
	transform.structured.vectorize %arg0			transform.structured.vectorize %arg0
	}			}

mlir/test/Dialect/Vector/transform-vector.mlir

	// RUN: mlir-opt %s --test-transform-dialect-interpreter --split-input-file \| FileCheck %s			// RUN: mlir-opt %s --test-transform-dialect-interpreter --split-input-file \| FileCheck %s

	// CHECK-LABEL: func @matmul_tensors			// CHECK-LABEL: func @matmul_tensors
	func.func @matmul_tensors(			func.func @matmul_tensors(
	%arg0: tensor<8x16xf32>, %arg1: tensor<16x32xf32>, %arg2: tensor<8x32xf32>)			%arg0: tensor<8x16xf32>, %arg1: tensor<16x32xf32>, %arg2: tensor<8x32xf32>)
	-> tensor<8x32xf32> {			-> tensor<8x32xf32> {
	// CHECK-NOT: linalg			// CHECK-NOT: linalg
	// CHECK: vector.extract {{.*}} : vector<8x4xf32>			// CHECK: vector.extract {{.*}} : vector<8x4xf32>
	// CHECK: vector.store {{.*}} : memref<8x32xf32>, vector<4xf32>			// CHECK: vector.store {{.*}} : memref<8x32xf32>, vector<4xf32>
	%0 = linalg.matmul ins(%arg0, %arg1: tensor<8x16xf32>, tensor<16x32xf32>)			%0 = linalg.matmul ins(%arg0, %arg1: tensor<8x16xf32>, tensor<16x32xf32>)
	outs(%arg2: tensor<8x32xf32>)			outs(%arg2: tensor<8x32xf32>)
	-> tensor<8x32xf32>
	return %0 : tensor<8x32xf32>			return %0 : tensor<8x32xf32>
	}			}

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb1(%module_op: !pdl.operation):			^bb1(%module_op: !pdl.operation):
	%0 = transform.structured.match ops{["linalg.matmul"]} in %module_op			%0 = transform.structured.match ops{["linalg.matmul"]} in %module_op
	%1, %loops:3 = transform.structured.tile %0 [8, 4, 2]			%1, %loops:3 = transform.structured.tile %0 [8, 4, 2]
	%2 = get_closest_isolated_parent %1 : (!pdl.operation) -> !pdl.operation			%2 = get_closest_isolated_parent %1 : (!pdl.operation) -> !pdl.operation
	transform.structured.vectorize %2			transform.structured.vectorize %2
	transform.bufferization.one_shot_bufferize %module_op			transform.bufferization.one_shot_bufferize %module_op

	%func = transform.structured.match ops{["func.func"]} in %module_op			%func = transform.structured.match ops{["func.func"]} in %module_op
	transform.vector.lower_vectors %func { multireduction_lowering = "innerreduce"}			transform.vector.lower_vectors %func { multireduction_lowering = "innerreduce"}
	}			}

mlir/test/Integration/Dialect/Linalg/CPU/test-one-shot-bufferize.mlir

// RUN: mlir-opt %s -pass-pipeline="builtin.module(func.func(canonicalize,cse),one-shot-bufferize{bufferize-function-boundaries})" \|\		// RUN: mlir-opt %s -pass-pipeline="builtin.module(func.func(canonicalize,cse),one-shot-bufferize{bufferize-function-boundaries})" \|\
// RUN: mlir-opt -pass-pipeline="builtin.module(func.func(buffer-deallocation,convert-vector-to-scf,lower-affine,convert-linalg-to-loops))" \|\		// RUN: mlir-opt -pass-pipeline="builtin.module(func.func(buffer-deallocation,convert-vector-to-scf,lower-affine,convert-linalg-to-loops))" \|\
// RUN: mlir-opt -pass-pipeline="builtin.module(func.func(canonicalize,convert-scf-to-cf),convert-vector-to-llvm,expand-strided-metadata,lower-affine,convert-arith-to-llvm,convert-memref-to-llvm,convert-func-to-llvm,reconcile-unrealized-casts)" \| \		// RUN: mlir-opt -pass-pipeline="builtin.module(func.func(canonicalize,convert-scf-to-cf),convert-vector-to-llvm,expand-strided-metadata,lower-affine,convert-arith-to-llvm,convert-memref-to-llvm,convert-func-to-llvm,reconcile-unrealized-casts)" \| \

// RUN: mlir-cpu-runner -O3 -e main -entry-point-result=void \		// RUN: mlir-cpu-runner -O3 -e main -entry-point-result=void \
// RUN: -shared-libs=%mlir_lib_dir/libmlir_runner_utils%shlibext,%mlir_lib_dir/libmlir_c_runner_utils%shlibext \|\		// RUN: -shared-libs=%mlir_lib_dir/libmlir_runner_utils%shlibext,%mlir_lib_dir/libmlir_c_runner_utils%shlibext \|\
// RUN: FileCheck %s		// RUN: FileCheck %s

#map0 = affine_map<(d0, d1)[s0] -> ((d1 - d0) ceildiv s0)>		#map0 = affine_map<(d0, d1)[s0] -> ((d1 - d0) ceildiv s0)>
#map1 = affine_map<(d0, d1)[s0] -> ((d0 - d1) ceildiv s0)>		#map1 = affine_map<(d0, d1)[s0] -> ((d0 - d1) ceildiv s0)>

func.func @init_and_dot(%arg0: tensor<64xf32>, %arg1: tensor<64xf32>, %arg2: tensor<f32>) -> tensor<f32> {		func.func @init_and_dot(%arg0: tensor<64xf32>, %arg1: tensor<64xf32>, %arg2: tensor<f32>) -> tensor<f32> {
%c64 = arith.constant 64 : index		%c64 = arith.constant 64 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%c2 = arith.constant 2 : index		%c2 = arith.constant 2 : index
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%0 = linalg.fill ins(%cst : f32) outs(%arg2 : tensor<f32>) -> tensor<f32>		%0 = linalg.fill ins(%cst : f32) outs(%arg2 : tensor<f32>)
%1 = affine.apply #map0(%c0, %c64)[%c2]		%1 = affine.apply #map0(%c0, %c64)[%c2]
%2 = bufferization.alloc_tensor(%1) : tensor<?x2xf32>		%2 = bufferization.alloc_tensor(%1) : tensor<?x2xf32>
%3 = scf.for %arg3 = %c0 to %c64 step %c2 iter_args(%arg4 = %2) -> (tensor<?x2xf32>) {		%3 = scf.for %arg3 = %c0 to %c64 step %c2 iter_args(%arg4 = %2) -> (tensor<?x2xf32>) {
%8 = affine.apply #map1(%arg3, %c0)[%c2]		%8 = affine.apply #map1(%arg3, %c0)[%c2]
%9 = tensor.extract_slice %arg1[%arg3] [2] [1] : tensor<64xf32> to tensor<2xf32>		%9 = tensor.extract_slice %arg1[%arg3] [2] [1] : tensor<64xf32> to tensor<2xf32>
%10 = tensor.cast %9 : tensor<2xf32> to tensor<?xf32>		%10 = tensor.cast %9 : tensor<2xf32> to tensor<?xf32>
%11 = tensor.pad %10 low[%c0] high[%c0] {		%11 = tensor.pad %10 low[%c0] high[%c0] {
^bb0(%arg5: index):		^bb0(%arg5: index):
Show All 30 Lines	%7 = scf.for %arg3 = %c0 to %c64 step %c2 iter_args(%arg4 = %0) -> (tensor<f32>) {
%8 = tensor.extract_slice %arg0[%arg3] [2] [1] : tensor<64xf32> to tensor<2xf32>		%8 = tensor.extract_slice %arg0[%arg3] [2] [1] : tensor<64xf32> to tensor<2xf32>
%9 = tensor.cast %8 : tensor<2xf32> to tensor<?xf32>		%9 = tensor.cast %8 : tensor<2xf32> to tensor<?xf32>
%10 = tensor.extract_slice %arg1[%arg3] [2] [1] : tensor<64xf32> to tensor<2xf32>		%10 = tensor.extract_slice %arg1[%arg3] [2] [1] : tensor<64xf32> to tensor<2xf32>
%11 = tensor.cast %10 : tensor<2xf32> to tensor<?xf32>		%11 = tensor.cast %10 : tensor<2xf32> to tensor<?xf32>
%12 = affine.apply #map1(%arg3, %c0)[%c2]		%12 = affine.apply #map1(%arg3, %c0)[%c2]
%13 = tensor.extract_slice %6[%12, 0] [1, 2] [1, 1] : tensor<?x2xf32> to tensor<2xf32>		%13 = tensor.extract_slice %6[%12, 0] [1, 2] [1, 1] : tensor<?x2xf32> to tensor<2xf32>
%14 = affine.apply #map1(%arg3, %c0)[%c2]		%14 = affine.apply #map1(%arg3, %c0)[%c2]
%15 = tensor.extract_slice %3[%14, 0] [1, 2] [1, 1] : tensor<?x2xf32> to tensor<2xf32>		%15 = tensor.extract_slice %3[%14, 0] [1, 2] [1, 1] : tensor<?x2xf32> to tensor<2xf32>
%16 = linalg.dot ins(%13, %15 : tensor<2xf32>, tensor<2xf32>) outs(%arg4 : tensor<f32>) -> tensor<f32>		%16 = linalg.dot ins(%13, %15 : tensor<2xf32>, tensor<2xf32>) outs(%arg4 : tensor<f32>)

// %AA = tensor.cast %13 : tensor<2xf32> to tensor<*xf32>		// %AA = tensor.cast %13 : tensor<2xf32> to tensor<*xf32>
// call @printMemrefF32(%AA) : (tensor<*xf32>) -> ()		// call @printMemrefF32(%AA) : (tensor<*xf32>) -> ()
// %BB = tensor.cast %15 : tensor<2xf32> to tensor<*xf32>		// %BB = tensor.cast %15 : tensor<2xf32> to tensor<*xf32>
// call @printMemrefF32(%BB) : (tensor<*xf32>) -> ()		// call @printMemrefF32(%BB) : (tensor<*xf32>) -> ()
// %CC = tensor.cast %16 : tensor<f32> to tensor<*xf32>		// %CC = tensor.cast %16 : tensor<f32> to tensor<*xf32>
// call @printMemrefF32(%CC) : (tensor<*xf32>) -> ()		// call @printMemrefF32(%CC) : (tensor<*xf32>) -> ()

scf.yield %16 : tensor<f32>		scf.yield %16 : tensor<f32>
}		}
return %7 : tensor<f32>		return %7 : tensor<f32>
}		}

func.func @main() {		func.func @main() {
%v0 = arith.constant 0.0 : f32		%v0 = arith.constant 0.0 : f32
%v1 = arith.constant 1.0 : f32		%v1 = arith.constant 1.0 : f32
%v2 = arith.constant 2.0 : f32		%v2 = arith.constant 2.0 : f32

%A = bufferization.alloc_tensor() : tensor<64xf32>		%A = bufferization.alloc_tensor() : tensor<64xf32>
%B = bufferization.alloc_tensor() : tensor<64xf32>		%B = bufferization.alloc_tensor() : tensor<64xf32>
%C = bufferization.alloc_tensor() : tensor<f32>		%C = bufferization.alloc_tensor() : tensor<f32>
%AA = linalg.fill ins(%v1 : f32) outs(%A : tensor<64xf32>) -> tensor<64xf32>		%AA = linalg.fill ins(%v1 : f32) outs(%A : tensor<64xf32>)
%BB = linalg.fill ins(%v2 : f32) outs(%B : tensor<64xf32>) -> tensor<64xf32>		%BB = linalg.fill ins(%v2 : f32) outs(%B : tensor<64xf32>)
%CC = linalg.fill ins(%v0 : f32) outs(%C : tensor<f32>) -> tensor<f32>		%CC = linalg.fill ins(%v0 : f32) outs(%C : tensor<f32>)

%res = call @init_and_dot(%AA, %BB, %CC) :		%res = call @init_and_dot(%AA, %BB, %CC) :
(tensor<64xf32>, tensor<64xf32>, tensor<f32>) -> tensor<f32>		(tensor<64xf32>, tensor<64xf32>, tensor<f32>) -> tensor<f32>

%res2 = tensor.cast %res: tensor<f32> to tensor<*xf32>		%res2 = tensor.cast %res: tensor<f32> to tensor<*xf32>

// CHECK: Unranked Memref base@ = {{.*}} rank = 0 offset = 0 sizes = [] strides = [] data =		// CHECK: Unranked Memref base@ = {{.*}} rank = 0 offset = 0 sizes = [] strides = [] data =
// CHECK-NEXT: [128]		// CHECK-NEXT: [128]
call @printMemrefF32(%res2) : (tensor<*xf32>) -> ()		call @printMemrefF32(%res2) : (tensor<*xf32>) -> ()

return		return
}		}

func.func private @printMemrefF32(tensor<*xf32>) attributes { llvm.emit_c_interface }		func.func private @printMemrefF32(tensor<*xf32>) attributes { llvm.emit_c_interface }

mlir/test/Integration/Dialect/Linalg/CPU/test-tensor-matmul.mlir

	Show All 17 Lines
	func.func @main() {			func.func @main() {
	%A = arith.constant dense<[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]> : tensor<2x3xf32>			%A = arith.constant dense<[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]> : tensor<2x3xf32>
	%B = arith.constant dense<[[1.0, 2.0, 3.0, 4.0],			%B = arith.constant dense<[[1.0, 2.0, 3.0, 4.0],
	[5.0, 6.0, 7.0, 8.0],			[5.0, 6.0, 7.0, 8.0],
	[9.0, 10.0, 11.0, 12.0]]> : tensor<3x4xf32>			[9.0, 10.0, 11.0, 12.0]]> : tensor<3x4xf32>
	%C = arith.constant dense<1000.0> : tensor<2x4xf32>			%C = arith.constant dense<1000.0> : tensor<2x4xf32>

	%D = linalg.matmul ins(%A, %B: tensor<2x3xf32>, tensor<3x4xf32>)			%D = linalg.matmul ins(%A, %B: tensor<2x3xf32>, tensor<3x4xf32>)
	outs(%C: tensor<2x4xf32>) -> tensor<2x4xf32>			outs(%C: tensor<2x4xf32>)

	%unranked = tensor.cast %D : tensor<2x4xf32> to tensor<*xf32>			%unranked = tensor.cast %D : tensor<2x4xf32> to tensor<*xf32>
	call @printMemrefF32(%unranked) : (tensor<*xf32>) -> ()			call @printMemrefF32(%unranked) : (tensor<*xf32>) -> ()

	// CHECK: Unranked Memref base@ = {{0x[-9a-f]*}}			// CHECK: Unranked Memref base@ = {{0x[-9a-f]*}}
	// CHECK-SAME: rank = 2 offset = 0 sizes = [2, 4] strides = [4, 1] data =			// CHECK-SAME: rank = 2 offset = 0 sizes = [2, 4] strides = [4, 1] data =
	// CHECK-NEXT: [1038, 1044, 1050, 1056]			// CHECK-NEXT: [1038, 1044, 1050, 1056]
	// CHECK-NEXT: [1083, 1098, 1113, 1128]			// CHECK-NEXT: [1083, 1098, 1113, 1128]
	Show All 11 Lines

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_conv_1d_nwc_wcf.mlir

Show All 21 Lines	#CDC = #sparse_tensor.encoding<{
dimLevelType = [ "compressed", "dense", "compressed" ]		dimLevelType = [ "compressed", "dense", "compressed" ]
// FIXME: Still inadmissible might need investigation		// FIXME: Still inadmissible might need investigation
// dimOrdering = affine_map<(i,j,k) -> (j,k,i)>		// dimOrdering = affine_map<(i,j,k) -> (j,k,i)>
}>		}>

// Creates and returns 3-D buffer of size (%s1, %s2, %s3) filled with the value %f		// Creates and returns 3-D buffer of size (%s1, %s2, %s3) filled with the value %f
func.func @alloc_3d_filled_f32(%s1 : index, %s2 : index, %s3 : index, %f : f32) -> tensor<?x?x?xf32> {		func.func @alloc_3d_filled_f32(%s1 : index, %s2 : index, %s3 : index, %f : f32) -> tensor<?x?x?xf32> {
%buf = bufferization.alloc_tensor(%s1, %s2, %s3) : tensor<?x?x?xf32>		%buf = bufferization.alloc_tensor(%s1, %s2, %s3) : tensor<?x?x?xf32>
%ret = linalg.fill ins(%f : f32) outs(%buf : tensor<?x?x?xf32>) -> tensor<?x?x?xf32>		%ret = linalg.fill ins(%f : f32) outs(%buf : tensor<?x?x?xf32>)
return %ret : tensor<?x?x?xf32>		return %ret : tensor<?x?x?xf32>
}		}

func.func @conv_1d_nwc_wcf(%arg0: tensor<?x?x?xf32>, %arg1: tensor<?x?x?xf32>, %arg2: tensor<?x?x?xf32>) -> tensor<?x?x?xf32> {		func.func @conv_1d_nwc_wcf(%arg0: tensor<?x?x?xf32>, %arg1: tensor<?x?x?xf32>, %arg2: tensor<?x?x?xf32>) -> tensor<?x?x?xf32> {
%ret = linalg.conv_1d_nwc_wcf {dilations = dense<1> : tensor<1xi64>,		%ret = linalg.conv_1d_nwc_wcf {dilations = dense<1> : tensor<1xi64>,
strides = dense<1> : tensor<1xi64>}		strides = dense<1> : tensor<1xi64>}
ins (%arg0, %arg1: tensor<?x?x?xf32>, tensor<?x?x?xf32>)		ins (%arg0, %arg1: tensor<?x?x?xf32>, tensor<?x?x?xf32>)
outs (%arg2: tensor<?x?x?xf32>) -> tensor<?x?x?xf32>		outs (%arg2: tensor<?x?x?xf32>)
return %ret : tensor<?x?x?xf32>		return %ret : tensor<?x?x?xf32>
}		}

func.func @conv_1d_nwc_wcf_CCC(%arg0: tensor<?x?x?xf32, #CCC>, %arg1: tensor<?x?x?xf32, #CCC>) -> tensor<?x?x?xf32, #CCC> {		func.func @conv_1d_nwc_wcf_CCC(%arg0: tensor<?x?x?xf32, #CCC>, %arg1: tensor<?x?x?xf32, #CCC>) -> tensor<?x?x?xf32, #CCC> {
%c1 = arith.constant 1 : index		%c1 = arith.constant 1 : index
%c3 = arith.constant 3 : index		%c3 = arith.constant 3 : index
%c6 = arith.constant 6 : index		%c6 = arith.constant 6 : index
%s = bufferization.alloc_tensor(%c3, %c6, %c1) : tensor<?x?x?xf32, #CCC>		%s = bufferization.alloc_tensor(%c3, %c6, %c1) : tensor<?x?x?xf32, #CCC>
%ret = linalg.conv_1d_nwc_wcf {dilations = dense<1> : tensor<1xi64>,		%ret = linalg.conv_1d_nwc_wcf {dilations = dense<1> : tensor<1xi64>,
strides = dense<1> : tensor<1xi64>}		strides = dense<1> : tensor<1xi64>}
ins (%arg0, %arg1: tensor<?x?x?xf32, #CCC>, tensor<?x?x?xf32, #CCC>)		ins (%arg0, %arg1: tensor<?x?x?xf32, #CCC>, tensor<?x?x?xf32, #CCC>)
outs (%s: tensor<?x?x?xf32, #CCC>) -> tensor<?x?x?xf32, #CCC>		outs (%s: tensor<?x?x?xf32, #CCC>)
return %ret : tensor<?x?x?xf32, #CCC>		return %ret : tensor<?x?x?xf32, #CCC>
}		}

func.func @conv_1d_nwc_wcf_CDC(%arg0: tensor<?x?x?xf32, #CDC>, %arg1: tensor<?x?x?xf32, #CDC>) -> tensor<?x?x?xf32, #CDC> {		func.func @conv_1d_nwc_wcf_CDC(%arg0: tensor<?x?x?xf32, #CDC>, %arg1: tensor<?x?x?xf32, #CDC>) -> tensor<?x?x?xf32, #CDC> {
%c1 = arith.constant 1 : index		%c1 = arith.constant 1 : index
%c3 = arith.constant 3 : index		%c3 = arith.constant 3 : index
%c6 = arith.constant 6 : index		%c6 = arith.constant 6 : index
%s = bufferization.alloc_tensor(%c3, %c6, %c1) : tensor<?x?x?xf32, #CDC>		%s = bufferization.alloc_tensor(%c3, %c6, %c1) : tensor<?x?x?xf32, #CDC>
%ret = linalg.conv_1d_nwc_wcf {dilations = dense<1> : tensor<1xi64>,		%ret = linalg.conv_1d_nwc_wcf {dilations = dense<1> : tensor<1xi64>,
strides = dense<1> : tensor<1xi64>}		strides = dense<1> : tensor<1xi64>}
ins (%arg0, %arg1: tensor<?x?x?xf32, #CDC>, tensor<?x?x?xf32, #CDC>)		ins (%arg0, %arg1: tensor<?x?x?xf32, #CDC>, tensor<?x?x?xf32, #CDC>)
outs (%s: tensor<?x?x?xf32, #CDC>) -> tensor<?x?x?xf32, #CDC>		outs (%s: tensor<?x?x?xf32, #CDC>)
return %ret : tensor<?x?x?xf32, #CDC>		return %ret : tensor<?x?x?xf32, #CDC>
}		}

func.func @entry() {		func.func @entry() {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index		%c1 = arith.constant 1 : index
%c3 = arith.constant 3 : index		%c3 = arith.constant 3 : index
%c6 = arith.constant 6 : index		%c6 = arith.constant 6 : index
▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_conv_2d.mlir

	Show All 24 Lines
	// An example of a 2D convolution with a sparse filter.			// An example of a 2D convolution with a sparse filter.
	module {			module {

	func.func @conv2d(%input: tensor<8x8xi32>,			func.func @conv2d(%input: tensor<8x8xi32>,
	%filter: tensor<3x3xi32, #DCSR>,			%filter: tensor<3x3xi32, #DCSR>,
	%output: tensor<6x6xi32>) -> tensor<6x6xi32> {			%output: tensor<6x6xi32>) -> tensor<6x6xi32> {
	%0 = linalg.conv_2d			%0 = linalg.conv_2d
	ins (%input, %filter: tensor<8x8xi32>, tensor<3x3xi32, #DCSR>)			ins (%input, %filter: tensor<8x8xi32>, tensor<3x3xi32, #DCSR>)
	outs (%output: tensor<6x6xi32>) -> tensor<6x6xi32>			outs (%output: tensor<6x6xi32>)
	return %0 : tensor<6x6xi32>			return %0 : tensor<6x6xi32>
	}			}

	func.func @conv2d_sparse_out(%input: tensor<8x8xi32>,			func.func @conv2d_sparse_out(%input: tensor<8x8xi32>,
	%filter: tensor<3x3xi32, #DCSR>) -> tensor<6x6xi32, #DCSR> {			%filter: tensor<3x3xi32, #DCSR>) -> tensor<6x6xi32, #DCSR> {
	%s = bufferization.alloc_tensor() : tensor<6x6xi32, #DCSR>			%s = bufferization.alloc_tensor() : tensor<6x6xi32, #DCSR>
	%0 = linalg.conv_2d			%0 = linalg.conv_2d
	ins (%input, %filter: tensor<8x8xi32>, tensor<3x3xi32, #DCSR>)			ins (%input, %filter: tensor<8x8xi32>, tensor<3x3xi32, #DCSR>)
	outs (%s: tensor<6x6xi32, #DCSR>) -> tensor<6x6xi32, #DCSR>			outs (%s: tensor<6x6xi32, #DCSR>)
	return %0 : tensor<6x6xi32, #DCSR>			return %0 : tensor<6x6xi32, #DCSR>
	}			}

	func.func @conv2d_all_sparse_DCSR(%input: tensor<8x8xi32, #DCSR>,			func.func @conv2d_all_sparse_DCSR(%input: tensor<8x8xi32, #DCSR>,
	%filter: tensor<3x3xi32, #DCSR>) -> tensor<6x6xi32, #DCSR> {			%filter: tensor<3x3xi32, #DCSR>) -> tensor<6x6xi32, #DCSR> {
	%s = bufferization.alloc_tensor() : tensor<6x6xi32, #DCSR>			%s = bufferization.alloc_tensor() : tensor<6x6xi32, #DCSR>
	%0 = linalg.conv_2d			%0 = linalg.conv_2d
	ins (%input, %filter: tensor<8x8xi32, #DCSR>, tensor<3x3xi32, #DCSR>)			ins (%input, %filter: tensor<8x8xi32, #DCSR>, tensor<3x3xi32, #DCSR>)
	outs (%s: tensor<6x6xi32, #DCSR>) -> tensor<6x6xi32, #DCSR>			outs (%s: tensor<6x6xi32, #DCSR>)
	return %0 : tensor<6x6xi32, #DCSR>			return %0 : tensor<6x6xi32, #DCSR>
	}			}

	func.func @conv2d_all_sparse_CSR(%input: tensor<8x8xi32, #CSR>,			func.func @conv2d_all_sparse_CSR(%input: tensor<8x8xi32, #CSR>,
	%filter: tensor<3x3xi32, #CSR>) -> tensor<6x6xi32, #CSR> {			%filter: tensor<3x3xi32, #CSR>) -> tensor<6x6xi32, #CSR> {
	%s = bufferization.alloc_tensor() : tensor<6x6xi32, #CSR>			%s = bufferization.alloc_tensor() : tensor<6x6xi32, #CSR>
	%0 = linalg.conv_2d			%0 = linalg.conv_2d
	ins (%input, %filter: tensor<8x8xi32, #CSR>, tensor<3x3xi32, #CSR>)			ins (%input, %filter: tensor<8x8xi32, #CSR>, tensor<3x3xi32, #CSR>)
	outs (%s: tensor<6x6xi32, #CSR>) -> tensor<6x6xi32, #CSR>			outs (%s: tensor<6x6xi32, #CSR>)
	return %0 : tensor<6x6xi32, #CSR>			return %0 : tensor<6x6xi32, #CSR>
	}			}

	func.func @conv2d_all_sparse_CSC(%input: tensor<8x8xi32, #CSC>,			func.func @conv2d_all_sparse_CSC(%input: tensor<8x8xi32, #CSC>,
	%filter: tensor<3x3xi32, #CSC>) -> tensor<6x6xi32, #CSC> {			%filter: tensor<3x3xi32, #CSC>) -> tensor<6x6xi32, #CSC> {
	%s = bufferization.alloc_tensor() : tensor<6x6xi32, #CSC>			%s = bufferization.alloc_tensor() : tensor<6x6xi32, #CSC>
	%0 = linalg.conv_2d			%0 = linalg.conv_2d
	ins (%input, %filter: tensor<8x8xi32, #CSC>, tensor<3x3xi32, #CSC>)			ins (%input, %filter: tensor<8x8xi32, #CSC>, tensor<3x3xi32, #CSC>)
	outs (%s: tensor<6x6xi32, #CSC>) -> tensor<6x6xi32, #CSC>			outs (%s: tensor<6x6xi32, #CSC>)
	return %0 : tensor<6x6xi32, #CSC>			return %0 : tensor<6x6xi32, #CSC>
	}			}

	func.func @entry() {			func.func @entry() {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%i0 = arith.constant 0 : i32			%i0 = arith.constant 0 : i32

	// A typical edge detection filter.			// A typical edge detection filter.
	▲ Show 20 Lines • Show All 138 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_conv_2d_nhwc_hwcf.mlir

	Show All 20 Lines

	#CDCD = #sparse_tensor.encoding<{			#CDCD = #sparse_tensor.encoding<{
	dimLevelType = [ "compressed", "dense", "compressed", "dense" ]			dimLevelType = [ "compressed", "dense", "compressed", "dense" ]
	}>			}>

	// Creates and returns 4-D buffer of size (%s1, %s2, %s3, %s4) filled with the value %f			// Creates and returns 4-D buffer of size (%s1, %s2, %s3, %s4) filled with the value %f
	func.func @alloc_4d_filled_f32(%s1 : index, %s2 : index, %s3 : index, %s4 : index, %f : f32) -> tensor<?x?x?x?xf32> {			func.func @alloc_4d_filled_f32(%s1 : index, %s2 : index, %s3 : index, %s4 : index, %f : f32) -> tensor<?x?x?x?xf32> {
	%buf = bufferization.alloc_tensor(%s1, %s2, %s3, %s4) : tensor<?x?x?x?xf32>			%buf = bufferization.alloc_tensor(%s1, %s2, %s3, %s4) : tensor<?x?x?x?xf32>
	%ret = linalg.fill ins(%f : f32) outs(%buf : tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32>			%ret = linalg.fill ins(%f : f32) outs(%buf : tensor<?x?x?x?xf32>)
	return %ret : tensor<?x?x?x?xf32>			return %ret : tensor<?x?x?x?xf32>
	}			}

	func.func @conv_2d_nhwc_hwcf(%arg0: tensor<?x?x?x?xf32>, %arg1: tensor<?x?x?x?xf32>, %arg2: tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32> {			func.func @conv_2d_nhwc_hwcf(%arg0: tensor<?x?x?x?xf32>, %arg1: tensor<?x?x?x?xf32>, %arg2: tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32> {
	%ret = linalg.conv_2d_nhwc_hwcf {dilations = dense<1> : tensor<2xi64>,			%ret = linalg.conv_2d_nhwc_hwcf {dilations = dense<1> : tensor<2xi64>,
	strides = dense<1> : tensor<2xi64>}			strides = dense<1> : tensor<2xi64>}
	ins (%arg0, %arg1: tensor<?x?x?x?xf32>, tensor<?x?x?x?xf32>)			ins (%arg0, %arg1: tensor<?x?x?x?xf32>, tensor<?x?x?x?xf32>)
	outs (%arg2: tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32>			outs (%arg2: tensor<?x?x?x?xf32>)
	return %ret : tensor<?x?x?x?xf32>			return %ret : tensor<?x?x?x?xf32>
	}			}

	func.func @conv_2d_nhwc_hwcf_CCCC(%arg0: tensor<?x?x?x?xf32, #CCCC>, %arg1: tensor<?x?x?x?xf32, #CCCC>) -> tensor<?x?x?x?xf32, #CCCC> {			func.func @conv_2d_nhwc_hwcf_CCCC(%arg0: tensor<?x?x?x?xf32, #CCCC>, %arg1: tensor<?x?x?x?xf32, #CCCC>) -> tensor<?x?x?x?xf32, #CCCC> {
	%c1 = arith.constant 1 : index			%c1 = arith.constant 1 : index
	%c3 = arith.constant 3 : index			%c3 = arith.constant 3 : index
	%c6 = arith.constant 6 : index			%c6 = arith.constant 6 : index
	%s = bufferization.alloc_tensor(%c3, %c6, %c6, %c1) : tensor<?x?x?x?xf32, #CCCC>			%s = bufferization.alloc_tensor(%c3, %c6, %c6, %c1) : tensor<?x?x?x?xf32, #CCCC>
	%ret = linalg.conv_2d_nhwc_hwcf {dilations = dense<1> : tensor<2xi64>,			%ret = linalg.conv_2d_nhwc_hwcf {dilations = dense<1> : tensor<2xi64>,
	strides = dense<1> : tensor<2xi64>}			strides = dense<1> : tensor<2xi64>}
	ins (%arg0, %arg1: tensor<?x?x?x?xf32, #CCCC>, tensor<?x?x?x?xf32, #CCCC>)			ins (%arg0, %arg1: tensor<?x?x?x?xf32, #CCCC>, tensor<?x?x?x?xf32, #CCCC>)
	outs (%s: tensor<?x?x?x?xf32, #CCCC>) -> tensor<?x?x?x?xf32, #CCCC>			outs (%s: tensor<?x?x?x?xf32, #CCCC>)
	return %ret : tensor<?x?x?x?xf32, #CCCC>			return %ret : tensor<?x?x?x?xf32, #CCCC>
	}			}

	func.func @conv_2d_nhwc_hwcf_CDCD(%arg0: tensor<?x?x?x?xf32, #CDCD>, %arg1: tensor<?x?x?x?xf32, #CDCD>) -> tensor<?x?x?x?xf32, #CDCD> {			func.func @conv_2d_nhwc_hwcf_CDCD(%arg0: tensor<?x?x?x?xf32, #CDCD>, %arg1: tensor<?x?x?x?xf32, #CDCD>) -> tensor<?x?x?x?xf32, #CDCD> {
	%c1 = arith.constant 1 : index			%c1 = arith.constant 1 : index
	%c3 = arith.constant 3 : index			%c3 = arith.constant 3 : index
	%c6 = arith.constant 6 : index			%c6 = arith.constant 6 : index
	%s = bufferization.alloc_tensor(%c3, %c6, %c6, %c1) : tensor<?x?x?x?xf32, #CDCD>			%s = bufferization.alloc_tensor(%c3, %c6, %c6, %c1) : tensor<?x?x?x?xf32, #CDCD>
	%ret = linalg.conv_2d_nhwc_hwcf {dilations = dense<1> : tensor<2xi64>,			%ret = linalg.conv_2d_nhwc_hwcf {dilations = dense<1> : tensor<2xi64>,
	strides = dense<1> : tensor<2xi64>}			strides = dense<1> : tensor<2xi64>}
	ins (%arg0, %arg1: tensor<?x?x?x?xf32, #CDCD>, tensor<?x?x?x?xf32, #CDCD>)			ins (%arg0, %arg1: tensor<?x?x?x?xf32, #CDCD>, tensor<?x?x?x?xf32, #CDCD>)
	outs (%s: tensor<?x?x?x?xf32, #CDCD>) -> tensor<?x?x?x?xf32, #CDCD>			outs (%s: tensor<?x?x?x?xf32, #CDCD>)
	return %ret : tensor<?x?x?x?xf32, #CDCD>			return %ret : tensor<?x?x?x?xf32, #CDCD>
	}			}

	func.func @entry() {			func.func @entry() {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%c1 = arith.constant 1 : index			%c1 = arith.constant 1 : index
	%c3 = arith.constant 3 : index			%c3 = arith.constant 3 : index
	%c6 = arith.constant 6 : index			%c6 = arith.constant 6 : index
	▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_conv_3d.mlir

	Show All 20 Lines

	#CDC = #sparse_tensor.encoding<{			#CDC = #sparse_tensor.encoding<{
	dimLevelType = [ "compressed", "dense", "compressed" ]			dimLevelType = [ "compressed", "dense", "compressed" ]
	}>			}>

	// Creates and returns 3-D buffer of size (%s1, %s2, %s3) filled with the value %f			// Creates and returns 3-D buffer of size (%s1, %s2, %s3) filled with the value %f
	func.func @alloc_3d_filled_f32(%s1 : index, %s2 : index, %s3 : index, %f : f32) -> tensor<?x?x?xf32> {			func.func @alloc_3d_filled_f32(%s1 : index, %s2 : index, %s3 : index, %f : f32) -> tensor<?x?x?xf32> {
	%buf = bufferization.alloc_tensor(%s1, %s2, %s3) : tensor<?x?x?xf32>			%buf = bufferization.alloc_tensor(%s1, %s2, %s3) : tensor<?x?x?xf32>
	%ret = linalg.fill ins(%f : f32) outs(%buf : tensor<?x?x?xf32>) -> tensor<?x?x?xf32>			%ret = linalg.fill ins(%f : f32) outs(%buf : tensor<?x?x?xf32>)
	return %ret : tensor<?x?x?xf32>			return %ret : tensor<?x?x?xf32>
	}			}

	func.func @conv_3d(%arg0: tensor<?x?x?xf32>, %arg1: tensor<?x?x?xf32>, %arg2: tensor<?x?x?xf32>) -> tensor<?x?x?xf32> {			func.func @conv_3d(%arg0: tensor<?x?x?xf32>, %arg1: tensor<?x?x?xf32>, %arg2: tensor<?x?x?xf32>) -> tensor<?x?x?xf32> {
	%ret = linalg.conv_3d			%ret = linalg.conv_3d
	ins (%arg0, %arg1: tensor<?x?x?xf32>, tensor<?x?x?xf32>)			ins (%arg0, %arg1: tensor<?x?x?xf32>, tensor<?x?x?xf32>)
	outs (%arg2: tensor<?x?x?xf32>) -> tensor<?x?x?xf32>			outs (%arg2: tensor<?x?x?xf32>)
	return %ret : tensor<?x?x?xf32>			return %ret : tensor<?x?x?xf32>
	}			}

	func.func @conv_3d_CCC(%arg0: tensor<?x?x?xf32, #CCC>, %arg1: tensor<?x?x?xf32, #CCC>) -> tensor<?x?x?xf32, #CCC> {			func.func @conv_3d_CCC(%arg0: tensor<?x?x?xf32, #CCC>, %arg1: tensor<?x?x?xf32, #CCC>) -> tensor<?x?x?xf32, #CCC> {
	%c6 = arith.constant 6 : index			%c6 = arith.constant 6 : index
	%s = bufferization.alloc_tensor(%c6, %c6, %c6) : tensor<?x?x?xf32, #CCC>			%s = bufferization.alloc_tensor(%c6, %c6, %c6) : tensor<?x?x?xf32, #CCC>
	%ret = linalg.conv_3d			%ret = linalg.conv_3d
	ins (%arg0, %arg1: tensor<?x?x?xf32, #CCC>, tensor<?x?x?xf32, #CCC>)			ins (%arg0, %arg1: tensor<?x?x?xf32, #CCC>, tensor<?x?x?xf32, #CCC>)
	outs (%s: tensor<?x?x?xf32, #CCC>) -> tensor<?x?x?xf32, #CCC>			outs (%s: tensor<?x?x?xf32, #CCC>)
	return %ret : tensor<?x?x?xf32, #CCC>			return %ret : tensor<?x?x?xf32, #CCC>
	}			}

	func.func @conv_3d_CDC(%arg0: tensor<?x?x?xf32, #CDC>, %arg1: tensor<?x?x?xf32, #CDC>) -> tensor<?x?x?xf32, #CDC> {			func.func @conv_3d_CDC(%arg0: tensor<?x?x?xf32, #CDC>, %arg1: tensor<?x?x?xf32, #CDC>) -> tensor<?x?x?xf32, #CDC> {
	%c6 = arith.constant 6 : index			%c6 = arith.constant 6 : index
	%s = bufferization.alloc_tensor(%c6, %c6, %c6) : tensor<?x?x?xf32, #CDC>			%s = bufferization.alloc_tensor(%c6, %c6, %c6) : tensor<?x?x?xf32, #CDC>
	%ret = linalg.conv_3d			%ret = linalg.conv_3d
	ins (%arg0, %arg1: tensor<?x?x?xf32, #CDC>, tensor<?x?x?xf32, #CDC>)			ins (%arg0, %arg1: tensor<?x?x?xf32, #CDC>, tensor<?x?x?xf32, #CDC>)
	outs (%s: tensor<?x?x?xf32, #CDC>) -> tensor<?x?x?xf32, #CDC>			outs (%s: tensor<?x?x?xf32, #CDC>)
	return %ret : tensor<?x?x?xf32, #CDC>			return %ret : tensor<?x?x?xf32, #CDC>
	}			}

	func.func @entry() {			func.func @entry() {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%c1 = arith.constant 1 : index			%c1 = arith.constant 1 : index
	%c3 = arith.constant 3 : index			%c3 = arith.constant 3 : index
	%c6 = arith.constant 6 : index			%c6 = arith.constant 6 : index
	▲ Show 20 Lines • Show All 163 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_conv_3d_ndhwc_dhwcf.mlir

	Show All 20 Lines

	#CDCDC = #sparse_tensor.encoding<{			#CDCDC = #sparse_tensor.encoding<{
	dimLevelType = [ "compressed", "dense", "compressed", "dense", "compressed"]			dimLevelType = [ "compressed", "dense", "compressed", "dense", "compressed"]
	}>			}>

	// Creates and returns 5-D buffer of size (%s1, %s2, %s3, %s4, %s5) filled with the value %f			// Creates and returns 5-D buffer of size (%s1, %s2, %s3, %s4, %s5) filled with the value %f
	func.func @alloc_5d_filled_f32(%s1 : index, %s2 : index, %s3 : index, %s4 : index, %s5 : index, %f : f32) -> tensor<?x?x?x?x?xf32> {			func.func @alloc_5d_filled_f32(%s1 : index, %s2 : index, %s3 : index, %s4 : index, %s5 : index, %f : f32) -> tensor<?x?x?x?x?xf32> {
	%buf = bufferization.alloc_tensor(%s1, %s2, %s3, %s4, %s5) : tensor<?x?x?x?x?xf32>			%buf = bufferization.alloc_tensor(%s1, %s2, %s3, %s4, %s5) : tensor<?x?x?x?x?xf32>
	%ret = linalg.fill ins(%f : f32) outs(%buf : tensor<?x?x?x?x?xf32>) -> tensor<?x?x?x?x?xf32>			%ret = linalg.fill ins(%f : f32) outs(%buf : tensor<?x?x?x?x?xf32>)
	return %ret : tensor<?x?x?x?x?xf32>			return %ret : tensor<?x?x?x?x?xf32>
	}			}

	func.func @conv_3d_ndhwc_dhwcf(%arg0: tensor<?x?x?x?x?xf32>,			func.func @conv_3d_ndhwc_dhwcf(%arg0: tensor<?x?x?x?x?xf32>,
	%arg1: tensor<?x?x?x?x?xf32>,			%arg1: tensor<?x?x?x?x?xf32>,
	%arg2: tensor<?x?x?x?x?xf32>) -> tensor<?x?x?x?x?xf32> {			%arg2: tensor<?x?x?x?x?xf32>) -> tensor<?x?x?x?x?xf32> {
	%ret = linalg.conv_3d_ndhwc_dhwcf {dilations = dense<1> : tensor<3xi64>,			%ret = linalg.conv_3d_ndhwc_dhwcf {dilations = dense<1> : tensor<3xi64>,
	strides = dense<1> : tensor<3xi64>}			strides = dense<1> : tensor<3xi64>}
	ins (%arg0, %arg1: tensor<?x?x?x?x?xf32>, tensor<?x?x?x?x?xf32>)			ins (%arg0, %arg1: tensor<?x?x?x?x?xf32>, tensor<?x?x?x?x?xf32>)
	outs (%arg2: tensor<?x?x?x?x?xf32>) -> tensor<?x?x?x?x?xf32>			outs (%arg2: tensor<?x?x?x?x?xf32>)
	return %ret : tensor<?x?x?x?x?xf32>			return %ret : tensor<?x?x?x?x?xf32>
	}			}

	func.func @conv_3d_ndhwc_dhwcf_CCCCC(%arg0: tensor<?x?x?x?x?xf32, #CCCCC>,			func.func @conv_3d_ndhwc_dhwcf_CCCCC(%arg0: tensor<?x?x?x?x?xf32, #CCCCC>,
	%arg1: tensor<?x?x?x?x?xf32, #CCCCC>)			%arg1: tensor<?x?x?x?x?xf32, #CCCCC>)
	-> tensor<?x?x?x?x?xf32, #CCCCC> {			-> tensor<?x?x?x?x?xf32, #CCCCC> {
	%c1 = arith.constant 1 : index			%c1 = arith.constant 1 : index
	%c6 = arith.constant 6 : index			%c6 = arith.constant 6 : index
	%s = bufferization.alloc_tensor(%c1, %c6, %c6, %c6, %c1)			%s = bufferization.alloc_tensor(%c1, %c6, %c6, %c6, %c1)
	: tensor<?x?x?x?x?xf32, #CCCCC>			: tensor<?x?x?x?x?xf32, #CCCCC>
	%ret = linalg.conv_3d_ndhwc_dhwcf {dilations = dense<1> : tensor<3xi64>,			%ret = linalg.conv_3d_ndhwc_dhwcf {dilations = dense<1> : tensor<3xi64>,
	strides = dense<1> : tensor<3xi64>}			strides = dense<1> : tensor<3xi64>}
	ins (%arg0, %arg1: tensor<?x?x?x?x?xf32, #CCCCC>, tensor<?x?x?x?x?xf32, #CCCCC>)			ins (%arg0, %arg1: tensor<?x?x?x?x?xf32, #CCCCC>, tensor<?x?x?x?x?xf32, #CCCCC>)
	outs (%s: tensor<?x?x?x?x?xf32, #CCCCC>) -> tensor<?x?x?x?x?xf32, #CCCCC>			outs (%s: tensor<?x?x?x?x?xf32, #CCCCC>)
	return %ret : tensor<?x?x?x?x?xf32, #CCCCC>			return %ret : tensor<?x?x?x?x?xf32, #CCCCC>
	}			}

	func.func @conv_3d_ndhwc_dhwcf_CDCDC(%arg0: tensor<?x?x?x?x?xf32, #CDCDC>,			func.func @conv_3d_ndhwc_dhwcf_CDCDC(%arg0: tensor<?x?x?x?x?xf32, #CDCDC>,
	%arg1: tensor<?x?x?x?x?xf32, #CDCDC>)			%arg1: tensor<?x?x?x?x?xf32, #CDCDC>)
	-> tensor<?x?x?x?x?xf32, #CDCDC> {			-> tensor<?x?x?x?x?xf32, #CDCDC> {
	%c1 = arith.constant 1 : index			%c1 = arith.constant 1 : index
	%c6 = arith.constant 6 : index			%c6 = arith.constant 6 : index
	%s = bufferization.alloc_tensor(%c1, %c6, %c6, %c6, %c1)			%s = bufferization.alloc_tensor(%c1, %c6, %c6, %c6, %c1)
	: tensor<?x?x?x?x?xf32, #CDCDC>			: tensor<?x?x?x?x?xf32, #CDCDC>
	%ret = linalg.conv_3d_ndhwc_dhwcf {dilations = dense<1> : tensor<3xi64>,			%ret = linalg.conv_3d_ndhwc_dhwcf {dilations = dense<1> : tensor<3xi64>,
	strides = dense<1> : tensor<3xi64>}			strides = dense<1> : tensor<3xi64>}
	ins (%arg0, %arg1: tensor<?x?x?x?x?xf32, #CDCDC>, tensor<?x?x?x?x?xf32, #CDCDC>)			ins (%arg0, %arg1: tensor<?x?x?x?x?xf32, #CDCDC>, tensor<?x?x?x?x?xf32, #CDCDC>)
	outs (%s: tensor<?x?x?x?x?xf32, #CDCDC>) -> tensor<?x?x?x?x?xf32, #CDCDC>			outs (%s: tensor<?x?x?x?x?xf32, #CDCDC>)
	return %ret : tensor<?x?x?x?x?xf32, #CDCDC>			return %ret : tensor<?x?x?x?x?xf32, #CDCDC>
	}			}

	func.func @entry() {			func.func @entry() {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%c1 = arith.constant 1 : index			%c1 = arith.constant 1 : index
	%c3 = arith.constant 3 : index			%c3 = arith.constant 3 : index
	%c6 = arith.constant 6 : index			%c6 = arith.constant 6 : index
	▲ Show 20 Lines • Show All 170 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_dot.mlir

Show All 21 Lines	module {
//		//
// Sparse kernel.		// Sparse kernel.
//		//
func.func @sparse_dot(%a: tensor<1024xf32, #SparseVector>,		func.func @sparse_dot(%a: tensor<1024xf32, #SparseVector>,
%b: tensor<1024xf32, #SparseVector>,		%b: tensor<1024xf32, #SparseVector>,
%x: tensor<f32>) -> tensor<f32> {		%x: tensor<f32>) -> tensor<f32> {
%dot = linalg.dot ins(%a, %b: tensor<1024xf32, #SparseVector>,		%dot = linalg.dot ins(%a, %b: tensor<1024xf32, #SparseVector>,
tensor<1024xf32, #SparseVector>)		tensor<1024xf32, #SparseVector>)
outs(%x: tensor<f32>) -> tensor<f32>		outs(%x: tensor<f32>)
return %dot : tensor<f32>		return %dot : tensor<f32>
}		}

//		//
// Main driver.		// Main driver.
//		//
func.func @entry() {		func.func @entry() {
// Setup two sparse vectors.		// Setup two sparse vectors.
Show All 39 Lines

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_expand.mlir

Show All 27 Lines	module {
// so that access pattern expansion (workspace) needs to be		// so that access pattern expansion (workspace) needs to be
// done along dimension with size 8.		// done along dimension with size 8.
//		//
func.func @matmul(%A: tensor<8x2xf64, #CSC>,		func.func @matmul(%A: tensor<8x2xf64, #CSC>,
%B: tensor<2x4xf64, #CSC>) -> tensor<8x4xf64, #CSC> {		%B: tensor<2x4xf64, #CSC>) -> tensor<8x4xf64, #CSC> {
%C = bufferization.alloc_tensor() : tensor<8x4xf64, #CSC>		%C = bufferization.alloc_tensor() : tensor<8x4xf64, #CSC>
%D = linalg.matmul		%D = linalg.matmul
ins(%A, %B: tensor<8x2xf64, #CSC>, tensor<2x4xf64, #CSC>)		ins(%A, %B: tensor<8x2xf64, #CSC>, tensor<2x4xf64, #CSC>)
outs(%C: tensor<8x4xf64, #CSC>) -> tensor<8x4xf64, #CSC>		outs(%C: tensor<8x4xf64, #CSC>)
return %D: tensor<8x4xf64, #CSC>		return %D: tensor<8x4xf64, #CSC>
}		}

//		//
// Main driver.		// Main driver.
//		//
func.func @entry() {		func.func @entry() {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_filter_conv2d.mlir

	Show All 19 Lines
	// An example of a 2D convolution with a sparse filter.			// An example of a 2D convolution with a sparse filter.
	module {			module {

	func.func @conv2d(%input: tensor<8x8xi32>,			func.func @conv2d(%input: tensor<8x8xi32>,
	%filter: tensor<3x3xi32, #DCSR>,			%filter: tensor<3x3xi32, #DCSR>,
	%output: tensor<6x6xi32>) -> tensor<6x6xi32> {			%output: tensor<6x6xi32>) -> tensor<6x6xi32> {
	%0 = linalg.conv_2d			%0 = linalg.conv_2d
	ins (%input, %filter: tensor<8x8xi32>, tensor<3x3xi32, #DCSR>)			ins (%input, %filter: tensor<8x8xi32>, tensor<3x3xi32, #DCSR>)
	outs (%output: tensor<6x6xi32>) -> tensor<6x6xi32>			outs (%output: tensor<6x6xi32>)
	return %0 : tensor<6x6xi32>			return %0 : tensor<6x6xi32>
	}			}

	func.func @conv2d_sparse_out(%input: tensor<8x8xi32>,			func.func @conv2d_sparse_out(%input: tensor<8x8xi32>,
	%filter: tensor<3x3xi32, #DCSR>) -> tensor<6x6xi32, #DCSR> {			%filter: tensor<3x3xi32, #DCSR>) -> tensor<6x6xi32, #DCSR> {
	%s = bufferization.alloc_tensor() : tensor<6x6xi32, #DCSR>			%s = bufferization.alloc_tensor() : tensor<6x6xi32, #DCSR>
	%0 = linalg.conv_2d			%0 = linalg.conv_2d
	ins (%input, %filter: tensor<8x8xi32>, tensor<3x3xi32, #DCSR>)			ins (%input, %filter: tensor<8x8xi32>, tensor<3x3xi32, #DCSR>)
	outs (%s: tensor<6x6xi32, #DCSR>) -> tensor<6x6xi32, #DCSR>			outs (%s: tensor<6x6xi32, #DCSR>)
	return %0 : tensor<6x6xi32, #DCSR>			return %0 : tensor<6x6xi32, #DCSR>
	}			}

	func.func @entry() {			func.func @entry() {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%i0 = arith.constant 0 : i32			%i0 = arith.constant 0 : i32

	// A typical edge detection filter.			// A typical edge detection filter.
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_matmul.mlir

Show All 38 Lines	module {

//		//
// Computes C = A x B with all matrices dense.		// Computes C = A x B with all matrices dense.
//		//
func.func @matmul1(%A: tensor<4x8xf64>, %B: tensor<8x4xf64>,		func.func @matmul1(%A: tensor<4x8xf64>, %B: tensor<8x4xf64>,
%C: tensor<4x4xf64>) -> tensor<4x4xf64> {		%C: tensor<4x4xf64>) -> tensor<4x4xf64> {
%D = linalg.matmul		%D = linalg.matmul
ins(%A, %B: tensor<4x8xf64>, tensor<8x4xf64>)		ins(%A, %B: tensor<4x8xf64>, tensor<8x4xf64>)
outs(%C: tensor<4x4xf64>) -> tensor<4x4xf64>		outs(%C: tensor<4x4xf64>)
return %D: tensor<4x4xf64>		return %D: tensor<4x4xf64>
}		}

//		//
// Computes C = A x B with all matrices sparse (SpMSpM) in CSR.		// Computes C = A x B with all matrices sparse (SpMSpM) in CSR.
//		//
func.func @matmul2(%A: tensor<4x8xf64, #CSR>,		func.func @matmul2(%A: tensor<4x8xf64, #CSR>,
%B: tensor<8x4xf64, #CSR>) -> tensor<4x4xf64, #CSR> {		%B: tensor<8x4xf64, #CSR>) -> tensor<4x4xf64, #CSR> {
%C = bufferization.alloc_tensor() : tensor<4x4xf64, #CSR>		%C = bufferization.alloc_tensor() : tensor<4x4xf64, #CSR>
%D = linalg.matmul		%D = linalg.matmul
ins(%A, %B: tensor<4x8xf64, #CSR>, tensor<8x4xf64, #CSR>)		ins(%A, %B: tensor<4x8xf64, #CSR>, tensor<8x4xf64, #CSR>)
outs(%C: tensor<4x4xf64, #CSR>) -> tensor<4x4xf64, #CSR>		outs(%C: tensor<4x4xf64, #CSR>)
return %D: tensor<4x4xf64, #CSR>		return %D: tensor<4x4xf64, #CSR>
}		}

//		//
// Computes C = A x B with all matrices sparse (SpMSpM) in DCSR.		// Computes C = A x B with all matrices sparse (SpMSpM) in DCSR.
//		//
func.func @matmul3(%A: tensor<4x8xf64, #DCSR>,		func.func @matmul3(%A: tensor<4x8xf64, #DCSR>,
%B: tensor<8x4xf64, #DCSR>) -> tensor<4x4xf64, #DCSR> {		%B: tensor<8x4xf64, #DCSR>) -> tensor<4x4xf64, #DCSR> {
%C = bufferization.alloc_tensor() : tensor<4x4xf64, #DCSR>		%C = bufferization.alloc_tensor() : tensor<4x4xf64, #DCSR>
%D = linalg.matmul		%D = linalg.matmul
ins(%A, %B: tensor<4x8xf64, #DCSR>, tensor<8x4xf64, #DCSR>)		ins(%A, %B: tensor<4x8xf64, #DCSR>, tensor<8x4xf64, #DCSR>)
outs(%C: tensor<4x4xf64, #DCSR>) -> tensor<4x4xf64, #DCSR>		outs(%C: tensor<4x4xf64, #DCSR>)
return %D: tensor<4x4xf64, #DCSR>		return %D: tensor<4x4xf64, #DCSR>
}		}

//		//
// Main driver.		// Main driver.
//		//
func.func @entry() {		func.func @entry() {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
▲ Show 20 Lines • Show All 234 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_quantized_matmul.mlir

Show All 24 Lines	module {

func.func @quantized_matmul(%input1: tensor<5x3xi8>,		func.func @quantized_matmul(%input1: tensor<5x3xi8>,
%input2: tensor<3x6xi8, #DCSR>,		%input2: tensor<3x6xi8, #DCSR>,
%output: tensor<5x6xi32>) -> tensor<5x6xi32> {		%output: tensor<5x6xi32>) -> tensor<5x6xi32> {
%c0 = arith.constant 0 : i32		%c0 = arith.constant 0 : i32
%c2 = arith.constant 2 : i32		%c2 = arith.constant 2 : i32
%0 = linalg.quantized_matmul		%0 = linalg.quantized_matmul
ins(%input1, %input2, %c2, %c0 : tensor<5x3xi8>, tensor<3x6xi8, #DCSR>, i32, i32)		ins(%input1, %input2, %c2, %c0 : tensor<5x3xi8>, tensor<3x6xi8, #DCSR>, i32, i32)
outs(%output : tensor<5x6xi32>) -> tensor<5x6xi32>		outs(%output : tensor<5x6xi32>)
return %0: tensor<5x6xi32>		return %0: tensor<5x6xi32>
}		}

func.func @entry() {		func.func @entry() {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%i0 = arith.constant 0 : i32		%i0 = arith.constant 0 : i32

%input1 = arith.constant dense<[		%input1 = arith.constant dense<[
▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

mlir/test/Interfaces/TilingInterface/tile-and-fuse-using-interface.mlir

	// RUN: mlir-opt -test-tiling-interface=tile-consumer-and-fuse-producer-using-scf-for -cse -split-input-file %s \| FileCheck %s			// RUN: mlir-opt -test-tiling-interface=tile-consumer-and-fuse-producer-using-scf-for -cse -split-input-file %s \| FileCheck %s

	func.func @gemm_fill_fusion(%arg0 : tensor<?x?xf32>, %arg1 : tensor<?x?xf32>) -> tensor<?x?xf32> {			func.func @gemm_fill_fusion(%arg0 : tensor<?x?xf32>, %arg1 : tensor<?x?xf32>) -> tensor<?x?xf32> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%c1 = arith.constant 1 : index			%c1 = arith.constant 1 : index
	%cst = arith.constant 0.0 : f32			%cst = arith.constant 0.0 : f32
	%d0 = tensor.dim %arg0, %c0 : tensor<?x?xf32>			%d0 = tensor.dim %arg0, %c0 : tensor<?x?xf32>
	%d1 = tensor.dim %arg1, %c1 : tensor<?x?xf32>			%d1 = tensor.dim %arg1, %c1 : tensor<?x?xf32>
	%init = tensor.empty(%d0, %d1) : tensor<?x?xf32>			%init = tensor.empty(%d0, %d1) : tensor<?x?xf32>
	%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<?x?xf32>) -> tensor<?x?xf32>			%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<?x?xf32>)
	%gemm = linalg.matmul {__internal_linalg_transform__ = "fusion"}			%gemm = linalg.matmul {__internal_linalg_transform__ = "fusion"}
	ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)			ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)
	outs(%fill : tensor<?x?xf32>) -> tensor<?x?xf32>			outs(%fill : tensor<?x?xf32>)
	return %gemm : tensor<?x?xf32>			return %gemm : tensor<?x?xf32>
	}			}
	// CHECK: func.func @gemm_fill_fusion(			// CHECK: func.func @gemm_fill_fusion(
	// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]: tensor<?x?xf32>			// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]: tensor<?x?xf32>
	// CHECK-SAME: %[[ARG1:[a-zA-Z0-9]+]]: tensor<?x?xf32>)			// CHECK-SAME: %[[ARG1:[a-zA-Z0-9]+]]: tensor<?x?xf32>)
	// CHECK: %[[INIT:.+]] = tensor.empty			// CHECK: %[[INIT:.+]] = tensor.empty
	// CHECK: scf.for %[[IV0:[a-zA-Z0-9]+]] =			// CHECK: scf.for %[[IV0:[a-zA-Z0-9]+]] =
	// CHECK-SAME: iter_args(%[[ITERARG0:.+]] = %[[INIT]])			// CHECK-SAME: iter_args(%[[ITERARG0:.+]] = %[[INIT]])
	Show All 15 Lines
	func.func @gemm_generic_fusion(%arg0 : tensor<?x?xf32>, %arg1 : tensor<?x?xf32>,			func.func @gemm_generic_fusion(%arg0 : tensor<?x?xf32>, %arg1 : tensor<?x?xf32>,
	%arg2 : tensor<?xf32>) -> tensor<?x?xf32> {			%arg2 : tensor<?xf32>) -> tensor<?x?xf32> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%c1 = arith.constant 1 : index			%c1 = arith.constant 1 : index
	%cst = arith.constant 0.0 : f32			%cst = arith.constant 0.0 : f32
	%d0 = tensor.dim %arg0, %c0 : tensor<?x?xf32>			%d0 = tensor.dim %arg0, %c0 : tensor<?x?xf32>
	%d1 = tensor.dim %arg1, %c1 : tensor<?x?xf32>			%d1 = tensor.dim %arg1, %c1 : tensor<?x?xf32>
	%init = tensor.empty(%d0, %d1) : tensor<?x?xf32>			%init = tensor.empty(%d0, %d1) : tensor<?x?xf32>
	%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<?x?xf32>) -> tensor<?x?xf32>			%fill = linalg.fill ins(%cst : f32) outs(%init : tensor<?x?xf32>)
	%gemm = linalg.matmul			%gemm = linalg.matmul
	ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)			ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)
	outs(%fill : tensor<?x?xf32>) -> tensor<?x?xf32>			outs(%fill : tensor<?x?xf32>)
	%generic = linalg.generic {			%generic = linalg.generic {
	__internal_linalg_transform__ = "fusion",			__internal_linalg_transform__ = "fusion",
	indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d1)>, affine_map<(d0, d1) -> (d0, d1)>],			indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d1)>, affine_map<(d0, d1) -> (d0, d1)>],
	iterator_types = ["parallel", "parallel"]}			iterator_types = ["parallel", "parallel"]}
	ins(%gemm, %arg2 : tensor<?x?xf32>, tensor<?xf32>) outs(%init : tensor<?x?xf32>) {			ins(%gemm, %arg2 : tensor<?x?xf32>, tensor<?xf32>) outs(%init : tensor<?x?xf32>) {
	^bb0(%b0 : f32, %b1 : f32, %b2 : f32):			^bb0(%b0 : f32, %b1 : f32, %b2 : f32):
	%add = arith.addf %b0, %b1 : f32			%add = arith.addf %b0, %b1 : f32
	linalg.yield %add : f32			linalg.yield %add : f32
	Show All 29 Lines

	func.func @gemm_gemm_fusion(%lhs0 : tensor<?x?xf32>, %rhs0 : tensor<?x?xf32>, %rhs1 : tensor<?x?xf32>) -> tensor<?x?xf32> {			func.func @gemm_gemm_fusion(%lhs0 : tensor<?x?xf32>, %rhs0 : tensor<?x?xf32>, %rhs1 : tensor<?x?xf32>) -> tensor<?x?xf32> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%c1 = arith.constant 1 : index			%c1 = arith.constant 1 : index
	%cst = arith.constant 0.0 : f32			%cst = arith.constant 0.0 : f32
	%d0 = tensor.dim %lhs0, %c0 : tensor<?x?xf32>			%d0 = tensor.dim %lhs0, %c0 : tensor<?x?xf32>
	%d1 = tensor.dim %rhs0, %c1 : tensor<?x?xf32>			%d1 = tensor.dim %rhs0, %c1 : tensor<?x?xf32>
	%init0 = tensor.empty(%d0, %d1) : tensor<?x?xf32>			%init0 = tensor.empty(%d0, %d1) : tensor<?x?xf32>
	%fill0 = linalg.fill ins(%cst : f32) outs(%init0 : tensor<?x?xf32>) -> tensor<?x?xf32>			%fill0 = linalg.fill ins(%cst : f32) outs(%init0 : tensor<?x?xf32>)
	%gemm0 = linalg.matmul			%gemm0 = linalg.matmul
	ins(%lhs0, %rhs0 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%fill0 : tensor<?x?xf32>) -> tensor<?x?xf32>			ins(%lhs0, %rhs0 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%fill0 : tensor<?x?xf32>)
	%d2 = tensor.dim %rhs1, %c1 : tensor<?x?xf32>			%d2 = tensor.dim %rhs1, %c1 : tensor<?x?xf32>
	%init1 = tensor.empty(%d0, %d2) : tensor<?x?xf32>			%init1 = tensor.empty(%d0, %d2) : tensor<?x?xf32>
	%fill1 = linalg.fill ins(%cst : f32) outs(%init1 : tensor<?x?xf32>) -> tensor<?x?xf32>			%fill1 = linalg.fill ins(%cst : f32) outs(%init1 : tensor<?x?xf32>)
	%gemm1 = linalg.matmul {__internal_linalg_transform__ = "gemm_fusion"}			%gemm1 = linalg.matmul {__internal_linalg_transform__ = "gemm_fusion"}
	ins(%gemm0, %rhs1 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%fill1 : tensor<?x?xf32>) -> tensor<?x?xf32>			ins(%gemm0, %rhs1 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%fill1 : tensor<?x?xf32>)
	return %gemm1 : tensor<?x?xf32>			return %gemm1 : tensor<?x?xf32>
	}			}
	// CHECK: func.func @gemm_gemm_fusion(			// CHECK: func.func @gemm_gemm_fusion(
	// CHECK-SAME: %[[LHS0:[a-zA-Z0-9]+]]: tensor<?x?xf32>			// CHECK-SAME: %[[LHS0:[a-zA-Z0-9]+]]: tensor<?x?xf32>
	// CHECK-SAME: %[[RHS0:[a-zA-Z0-9]+]]: tensor<?x?xf32>,			// CHECK-SAME: %[[RHS0:[a-zA-Z0-9]+]]: tensor<?x?xf32>,
	// CHECK-SAME: %[[RHS1:[a-zA-Z0-9]+]]: tensor<?x?xf32>)			// CHECK-SAME: %[[RHS1:[a-zA-Z0-9]+]]: tensor<?x?xf32>)
	// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index			// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index
	// CHECK-DAG: %[[C1:.+]] = arith.constant 1 : index			// CHECK-DAG: %[[C1:.+]] = arith.constant 1 : index
	Show All 26 Lines

	func.func @gemm_transpose_fusion(%arg0 : tensor<?x?xf32>, %arg1 : tensor<?x?xf32>) -> tensor<?x?xf32> {			func.func @gemm_transpose_fusion(%arg0 : tensor<?x?xf32>, %arg1 : tensor<?x?xf32>) -> tensor<?x?xf32> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%c1 = arith.constant 1 : index			%c1 = arith.constant 1 : index
	%cst = arith.constant 0.0 : f32			%cst = arith.constant 0.0 : f32
	%d0 = tensor.dim %arg0, %c0 : tensor<?x?xf32>			%d0 = tensor.dim %arg0, %c0 : tensor<?x?xf32>
	%d1 = tensor.dim %arg1, %c1 : tensor<?x?xf32>			%d1 = tensor.dim %arg1, %c1 : tensor<?x?xf32>
	%init0 = tensor.empty(%d0, %d1) : tensor<?x?xf32>			%init0 = tensor.empty(%d0, %d1) : tensor<?x?xf32>
	%fill = linalg.fill ins(%cst : f32) outs(%init0 : tensor<?x?xf32>) -> tensor<?x?xf32>			%fill = linalg.fill ins(%cst : f32) outs(%init0 : tensor<?x?xf32>)
	%gemm = linalg.matmul			%gemm = linalg.matmul
	ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)			ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)
	outs(%fill : tensor<?x?xf32>) -> tensor<?x?xf32>			outs(%fill : tensor<?x?xf32>)
	%init1 = tensor.empty(%d1, %d0) : tensor<?x?xf32>			%init1 = tensor.empty(%d1, %d0) : tensor<?x?xf32>
	%transpose = linalg.generic {			%transpose = linalg.generic {
	__internal_linalg_transform__ = "fusion",			__internal_linalg_transform__ = "fusion",
	indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d1, d0)>],			indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d1, d0)>],
	iterator_types = ["parallel", "parallel"]}			iterator_types = ["parallel", "parallel"]}
	ins(%gemm : tensor<?x?xf32>) outs(%init1 : tensor<?x?xf32>) {			ins(%gemm : tensor<?x?xf32>) outs(%init1 : tensor<?x?xf32>) {
	^bb0(%b0 : f32, %b1 : f32):			^bb0(%b0 : f32, %b1 : f32):
	linalg.yield %b0 : f32			linalg.yield %b0 : f32
	Show All 32 Lines

	func.func @interchange_matmul_fusion(%arg0 : tensor<?x?xf32>, %arg1 : tensor<?x?xf32>) -> tensor<?x?xf32> {			func.func @interchange_matmul_fusion(%arg0 : tensor<?x?xf32>, %arg1 : tensor<?x?xf32>) -> tensor<?x?xf32> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%c1 = arith.constant 1 : index			%c1 = arith.constant 1 : index
	%d0 = tensor.dim %arg0, %c0 : tensor<?x?xf32>			%d0 = tensor.dim %arg0, %c0 : tensor<?x?xf32>
	%d1 = tensor.dim %arg1, %c1 : tensor<?x?xf32>			%d1 = tensor.dim %arg1, %c1 : tensor<?x?xf32>
	%cst = arith.constant 0.0 : f32			%cst = arith.constant 0.0 : f32
	%0 = tensor.empty(%d0, %d1) : tensor<?x?xf32>			%0 = tensor.empty(%d0, %d1) : tensor<?x?xf32>
	%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<?x?xf32>) -> tensor<?x?xf32>			%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<?x?xf32>)
	%2 = linalg.matmul			%2 = linalg.matmul
	ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)			ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)
	outs(%1 : tensor<?x?xf32>) -> tensor<?x?xf32>			outs(%1 : tensor<?x?xf32>)
	%3 = linalg.generic {			%3 = linalg.generic {
	__internal_linalg_transform__ = "gemm_interchange_fusion",			__internal_linalg_transform__ = "gemm_interchange_fusion",
	indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>],			indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>],
	iterator_types = ["parallel", "parallel"]}			iterator_types = ["parallel", "parallel"]}
	ins(%2 : tensor<?x?xf32>) outs(%0 : tensor<?x?xf32>) {			ins(%2 : tensor<?x?xf32>) outs(%0 : tensor<?x?xf32>) {
	^bb0(%b0 : f32, %b1 : f32):			^bb0(%b0 : f32, %b1 : f32):
	%4 = arith.addf %b0, %b0 : f32			%4 = arith.addf %b0, %b0 : f32
	linalg.yield %4 : f32			linalg.yield %4 : f32
	Show All 27 Lines

	func.func @matmul_plus_matmul(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>,			func.func @matmul_plus_matmul(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>,
	%arg2: tensor<?x?xf32>) -> tensor<?x?xf32>{			%arg2: tensor<?x?xf32>) -> tensor<?x?xf32>{
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%c1 = arith.constant 1 : index			%c1 = arith.constant 1 : index
	%0 = tensor.dim %arg2, %c0 : tensor<?x?xf32>			%0 = tensor.dim %arg2, %c0 : tensor<?x?xf32>
	%1 = tensor.dim %arg2, %c1 : tensor<?x?xf32>			%1 = tensor.dim %arg2, %c1 : tensor<?x?xf32>
	%2 = linalg.matmul ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)			%2 = linalg.matmul ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)
	outs(%arg2 : tensor<?x?xf32>) -> tensor<?x?xf32>			outs(%arg2 : tensor<?x?xf32>)
	%3 = tensor.dim %2, %c0 : tensor<?x?xf32>			%3 = tensor.dim %2, %c0 : tensor<?x?xf32>
	%4 = tensor.dim %2, %c1 : tensor<?x?xf32>			%4 = tensor.dim %2, %c1 : tensor<?x?xf32>
	%5 = tensor.empty(%3, %4) : tensor<?x?xf32>			%5 = tensor.empty(%3, %4) : tensor<?x?xf32>
	%6 = linalg.generic			%6 = linalg.generic
	{indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,			{indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
	affine_map<(d0, d1) -> (d0, d1)>,			affine_map<(d0, d1) -> (d0, d1)>,
	affine_map<(d0, d1) -> (d0, d1)>],			affine_map<(d0, d1) -> (d0, d1)>],
	iterator_types = ["parallel", "parallel"],			iterator_types = ["parallel", "parallel"],
	Show All 36 Lines

	func.func @matmul_plus_transpose_matmul(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>,			func.func @matmul_plus_transpose_matmul(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>,
	%arg2: tensor<?x?xf32>) -> tensor<?x?xf32>{			%arg2: tensor<?x?xf32>) -> tensor<?x?xf32>{
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%c1 = arith.constant 1 : index			%c1 = arith.constant 1 : index
	%0 = tensor.dim %arg2, %c0 : tensor<?x?xf32>			%0 = tensor.dim %arg2, %c0 : tensor<?x?xf32>
	%1 = tensor.dim %arg2, %c1 : tensor<?x?xf32>			%1 = tensor.dim %arg2, %c1 : tensor<?x?xf32>
	%2 = linalg.matmul ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)			%2 = linalg.matmul ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)
	outs(%arg2 : tensor<?x?xf32>) -> tensor<?x?xf32>			outs(%arg2 : tensor<?x?xf32>)
	%3 = tensor.dim %2, %c0 : tensor<?x?xf32>			%3 = tensor.dim %2, %c0 : tensor<?x?xf32>
	%4 = tensor.dim %2, %c1 : tensor<?x?xf32>			%4 = tensor.dim %2, %c1 : tensor<?x?xf32>
	%5 = tensor.empty(%3, %4) : tensor<?x?xf32>			%5 = tensor.empty(%3, %4) : tensor<?x?xf32>
	%6 = linalg.generic			%6 = linalg.generic
	{indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,			{indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
	affine_map<(d0, d1) -> (d1, d0)>,			affine_map<(d0, d1) -> (d1, d0)>,
	affine_map<(d0, d1) -> (d0, d1)>],			affine_map<(d0, d1) -> (d0, d1)>],
	iterator_types = ["parallel", "parallel"],			iterator_types = ["parallel", "parallel"],
	Show All 38 Lines
	// CHECK: return %[[RESULT]]			// CHECK: return %[[RESULT]]

	// -----			// -----

	func.func @matmul_sequence_fusion(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>,			func.func @matmul_sequence_fusion(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>,
	%arg2: tensor<?x?xf32>, %arg3: tensor<?x?xf32>, %arg4: tensor<?x?xf32>,			%arg2: tensor<?x?xf32>, %arg3: tensor<?x?xf32>, %arg4: tensor<?x?xf32>,
	%arg5: tensor<?x?xf32>, %arg6: tensor<?x?xf32>) -> tensor<?x?xf32> {			%arg5: tensor<?x?xf32>, %arg6: tensor<?x?xf32>) -> tensor<?x?xf32> {
	%0 = linalg.matmul ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)			%0 = linalg.matmul ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)
	outs(%arg2 : tensor<?x?xf32>) -> tensor<?x?xf32> // [M, N0] * [N0, N1]			outs(%arg2 : tensor<?x?xf32>)
	%1 = linalg.matmul ins(%0, %arg3 : tensor<?x?xf32>, tensor<?x?xf32>)			%1 = linalg.matmul ins(%0, %arg3 : tensor<?x?xf32>, tensor<?x?xf32>)
	outs(%arg4 : tensor<?x?xf32>) -> tensor<?x?xf32> // [M, N1] * [N1, N2]			outs(%arg4 : tensor<?x?xf32>)
	%2 = linalg.matmul			%2 = linalg.matmul
	{__internal_linalg_transform__ = "gemm_sequence_fusion"}			{__internal_linalg_transform__ = "gemm_sequence_fusion"}
	ins(%1, %arg5 : tensor<?x?xf32>, tensor<?x?xf32>)			ins(%1, %arg5 : tensor<?x?xf32>, tensor<?x?xf32>)
	outs(%arg6 : tensor<?x?xf32>) -> tensor<?x?xf32> // [M, N2] * [N2, N3]			outs(%arg6 : tensor<?x?xf32>)
	return %2 : tensor<?x?xf32>			return %2 : tensor<?x?xf32>
	}			}

	// CHECK: #[[MAP:.+]] = affine_map<(d0)[s0] -> (10, -d0 + s0)>			// CHECK: #[[MAP:.+]] = affine_map<(d0)[s0] -> (10, -d0 + s0)>
	// CHECK: func @matmul_sequence_fusion(			// CHECK: func @matmul_sequence_fusion(
	// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: tensor<?x?xf32>			// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
	// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: tensor<?x?xf32>			// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
	// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]+]]: tensor<?x?xf32>			// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
	Show All 31 Lines
	// CHECK: scf.yield %[[UPDATE]]			// CHECK: scf.yield %[[UPDATE]]

	// -----			// -----

	func.func @reduction_sequence(%arg0: tensor<30x3xf32>) -> tensor<30x3xf32> {			func.func @reduction_sequence(%arg0: tensor<30x3xf32>) -> tensor<30x3xf32> {
	%cst = arith.constant 0.000000e+00 : f32			%cst = arith.constant 0.000000e+00 : f32
	%cst_0 = arith.constant 0xFF800000 : f32			%cst_0 = arith.constant 0xFF800000 : f32
	%0 = tensor.empty() : tensor<30xf32>			%0 = tensor.empty() : tensor<30xf32>
	%1 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<30xf32>) -> tensor<30xf32>			%1 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<30xf32>)
	%2 = linalg.generic {			%2 = linalg.generic {
	indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0)>],			indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0)>],
	iterator_types = ["parallel", "reduction"]}			iterator_types = ["parallel", "reduction"]}
	ins(%arg0 : tensor<30x3xf32>) outs(%1 : tensor<30xf32>) {			ins(%arg0 : tensor<30x3xf32>) outs(%1 : tensor<30xf32>) {
	^bb0(%arg1: f32, %arg2: f32):			^bb0(%arg1: f32, %arg2: f32):
	%8 = arith.maxf %arg2, %arg1 : f32			%8 = arith.maxf %arg2, %arg1 : f32
	linalg.yield %8 : f32			linalg.yield %8 : f32
	} -> tensor<30xf32>			} -> tensor<30xf32>
	%3 = tensor.empty() : tensor<30x3xf32>			%3 = tensor.empty() : tensor<30x3xf32>
	%4 = linalg.fill ins(%cst : f32) outs(%0 : tensor<30xf32>) -> tensor<30xf32>			%4 = linalg.fill ins(%cst : f32) outs(%0 : tensor<30xf32>)
	%5:2 = linalg.generic {			%5:2 = linalg.generic {
	indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0)>,			indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0)>,
	affine_map<(d0, d1) -> (d0)>, affine_map<(d0, d1) -> (d0, d1)>],			affine_map<(d0, d1) -> (d0)>, affine_map<(d0, d1) -> (d0, d1)>],
	iterator_types = ["parallel", "reduction"]}			iterator_types = ["parallel", "reduction"]}
	ins(%arg0, %2 : tensor<30x3xf32>, tensor<30xf32>) outs(%4, %3 : tensor<30xf32>, tensor<30x3xf32>) {			ins(%arg0, %2 : tensor<30x3xf32>, tensor<30xf32>) outs(%4, %3 : tensor<30xf32>, tensor<30x3xf32>) {
	^bb0(%arg1: f32, %arg2: f32, %arg3: f32, %arg4: f32):			^bb0(%arg1: f32, %arg2: f32, %arg3: f32, %arg4: f32):
	%8 = arith.subf %arg1, %arg2 : f32			%8 = arith.subf %arg1, %arg2 : f32
	%9 = math.exp %8 : f32			%9 = math.exp %8 : f32
	Show All 40 Lines

mlir/test/Interfaces/TilingInterface/tile-using-interface.mlir

	// RUN: mlir-opt -test-tiling-interface=tile-using-scf-for -split-input-file %s \| FileCheck %s			// RUN: mlir-opt -test-tiling-interface=tile-using-scf-for -split-input-file %s \| FileCheck %s

	func.func @simple_matmul(%arg0 : tensor<?x?xf32>, %arg1 : tensor<?x?xf32>,			func.func @simple_matmul(%arg0 : tensor<?x?xf32>, %arg1 : tensor<?x?xf32>,
	%arg2 : tensor<?x?xf32>) -> tensor<?x?xf32> {			%arg2 : tensor<?x?xf32>) -> tensor<?x?xf32> {
	%0 = linalg.matmul {__internal_linalg_transform__ = "simple_gemm"}			%0 = linalg.matmul {__internal_linalg_transform__ = "simple_gemm"}
	ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)			ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)
	outs(%arg2 : tensor<?x?xf32>) -> tensor<?x?xf32>			outs(%arg2 : tensor<?x?xf32>)
	return %0 : tensor<?x?xf32>			return %0 : tensor<?x?xf32>
	}			}
	// CHECK-DAG: #[[$MAP0:.+]] = affine_map<(d0)[s0] -> (10, -d0 + s0)>			// CHECK-DAG: #[[$MAP0:.+]] = affine_map<(d0)[s0] -> (10, -d0 + s0)>
	// CHECK-DAG: #[[$MAP1:.+]] = affine_map<(d0)[s0] -> (20, -d0 + s0)>			// CHECK-DAG: #[[$MAP1:.+]] = affine_map<(d0)[s0] -> (20, -d0 + s0)>
	// CHECK-LABEL: func.func @simple_matmul(			// CHECK-LABEL: func.func @simple_matmul(
	// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]: tensor<?x?xf32>			// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]: tensor<?x?xf32>
	// CHECK-SAME: %[[ARG1:[a-zA-Z0-9]+]]: tensor<?x?xf32>			// CHECK-SAME: %[[ARG1:[a-zA-Z0-9]+]]: tensor<?x?xf32>
	// CHECK-SAME: %[[ARG2:[a-zA-Z0-9]+]]: tensor<?x?xf32>			// CHECK-SAME: %[[ARG2:[a-zA-Z0-9]+]]: tensor<?x?xf32>
	▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines

	func.func @conv2D(%arg0 : tensor<?x?x?x?xf32>, %arg1 : tensor<?x?x?x?xf32>,			func.func @conv2D(%arg0 : tensor<?x?x?x?xf32>, %arg1 : tensor<?x?x?x?xf32>,
	%arg2 : tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32> {			%arg2 : tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32> {
	%0 = linalg.conv_2d_nhwc_hwcf {			%0 = linalg.conv_2d_nhwc_hwcf {
	strides = dense<[2, 3]> : tensor<2xi64>,			strides = dense<[2, 3]> : tensor<2xi64>,
	dilation = dense<[4, 5]> : tensor<2xi64>,			dilation = dense<[4, 5]> : tensor<2xi64>,
	__internal_linalg_transform__ = "simple_conv"}			__internal_linalg_transform__ = "simple_conv"}
	ins(%arg0, %arg1 : tensor<?x?x?x?xf32>, tensor<?x?x?x?xf32>)			ins(%arg0, %arg1 : tensor<?x?x?x?xf32>, tensor<?x?x?x?xf32>)
	outs(%arg2 : tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32>			outs(%arg2 : tensor<?x?x?x?xf32>)
	return %0 : tensor<?x?x?x?xf32>			return %0 : tensor<?x?x?x?xf32>
	}			}
	// CHECK-DAG: #[[$MAP0:.+]] = affine_map<(d0)[s0] -> (10, -d0 + s0)>			// CHECK-DAG: #[[$MAP0:.+]] = affine_map<(d0)[s0] -> (10, -d0 + s0)>
	// CHECK-DAG: #[[$MAP1:.+]] = affine_map<(d0)[s0] -> (20, -d0 + s0)>			// CHECK-DAG: #[[$MAP1:.+]] = affine_map<(d0)[s0] -> (20, -d0 + s0)>
	// CHECK-DAG: #[[$MAP2:.+]] = affine_map<(d0)[s0] -> (30, -d0 + s0)>			// CHECK-DAG: #[[$MAP2:.+]] = affine_map<(d0)[s0] -> (30, -d0 + s0)>
	// CHECK-DAG: #[[$MAP3:.+]] = affine_map<(d0)[s0] -> (d0 + s0 * 2 - 2)>			// CHECK-DAG: #[[$MAP3:.+]] = affine_map<(d0)[s0] -> (d0 + s0 * 2 - 2)>
	// CHECK-DAG: #[[$MAP4:.+]] = affine_map<(d0)[s0] -> (d0 + s0 * 3 - 3)>			// CHECK-DAG: #[[$MAP4:.+]] = affine_map<(d0)[s0] -> (d0 + s0 * 3 - 3)>
	// CHECK-LABEL: func.func @conv2D(			// CHECK-LABEL: func.func @conv2D(
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	}			}

	// -----			// -----

	func.func @interchange_matmul(%arg0 : tensor<?x?xf32>, %arg1 : tensor<?x?xf32>,			func.func @interchange_matmul(%arg0 : tensor<?x?xf32>, %arg1 : tensor<?x?xf32>,
	%arg2 : tensor<?x?xf32>) -> tensor<?x?xf32> {			%arg2 : tensor<?x?xf32>) -> tensor<?x?xf32> {
	%0 = linalg.matmul {__internal_linalg_transform__ = "gemm_interchange"}			%0 = linalg.matmul {__internal_linalg_transform__ = "gemm_interchange"}
	ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)			ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)
	outs(%arg2 : tensor<?x?xf32>) -> tensor<?x?xf32>			outs(%arg2 : tensor<?x?xf32>)
	return %0 : tensor<?x?xf32>			return %0 : tensor<?x?xf32>
	}			}
	// CHECK-DAG: #[[$MAP0:.+]] = affine_map<(d0)[s0] -> (20, -d0 + s0)>			// CHECK-DAG: #[[$MAP0:.+]] = affine_map<(d0)[s0] -> (20, -d0 + s0)>
	// CHECK-DAG: #[[$MAP1:.+]] = affine_map<(d0)[s0] -> (30, -d0 + s0)>			// CHECK-DAG: #[[$MAP1:.+]] = affine_map<(d0)[s0] -> (30, -d0 + s0)>
	// CHECK-DAG: #[[$MAP2:.+]] = affine_map<(d0)[s0] -> (10, -d0 + s0)>			// CHECK-DAG: #[[$MAP2:.+]] = affine_map<(d0)[s0] -> (10, -d0 + s0)>
	// CHECK-LABEL: func.func @interchange_matmul(			// CHECK-LABEL: func.func @interchange_matmul(
	// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]: tensor<?x?xf32>			// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]: tensor<?x?xf32>
	// CHECK-SAME: %[[ARG1:[a-zA-Z0-9]+]]: tensor<?x?xf32>			// CHECK-SAME: %[[ARG1:[a-zA-Z0-9]+]]: tensor<?x?xf32>
	Show All 33 Lines

mlir/test/python/dialects/linalg/ops.py

	Show All 15 Lines
	def testFill():			def testFill():
	with Context() as ctx, Location.unknown():			with Context() as ctx, Location.unknown():
	module = Module.create()			module = Module.create()
	f32 = F32Type.get()			f32 = F32Type.get()
	with InsertionPoint(module.body):			with InsertionPoint(module.body):
	# CHECK-LABEL: func @fill_tensor			# CHECK-LABEL: func @fill_tensor
	# CHECK-SAME: %[[OUT:[0-9a-z]+]]: tensor<12x?xf32>			# CHECK-SAME: %[[OUT:[0-9a-z]+]]: tensor<12x?xf32>
	# CHECK-NEXT: %[[CST:.]] = arith.constant 0.0{{.}} : f32			# CHECK-NEXT: %[[CST:.]] = arith.constant 0.0{{.}} : f32
	# CHECK-NEXT: %[[RES:.*]] = linalg.fill ins(%[[CST]] : f32) outs(%[[OUT]] : tensor<12x?xf32>) -> tensor<12x?xf32>			# CHECK-NEXT: %[[RES:.*]] = linalg.fill ins(%[[CST]] : f32) outs(%[[OUT]] : tensor<12x?xf32>)
	# CHECK-NEXT: return %[[RES]] : tensor<12x?xf32>			# CHECK-NEXT: return %[[RES]] : tensor<12x?xf32>
	@func.FuncOp.from_py_func(			@func.FuncOp.from_py_func(
	RankedTensorType.get((12, ShapedType.get_dynamic_size()), f32))			RankedTensorType.get((12, ShapedType.get_dynamic_size()), f32))
	def fill_tensor(out):			def fill_tensor(out):
	zero = arith.ConstantOp(value=FloatAttr.get(f32, 0.), result=f32).result			zero = arith.ConstantOp(value=FloatAttr.get(f32, 0.), result=f32).result
	return linalg.fill(zero, outs=[out])			return linalg.fill(zero, outs=[out])

	# CHECK-LABEL: func @fill_buffer			# CHECK-LABEL: func @fill_buffer
	▲ Show 20 Lines • Show All 118 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][linalg] Omit printing result types for named ops.Needs RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 489397

mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp

mlir/test/Conversion/TensorToLinalg/tensor-ops-to-linalg.mlir

mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-named.mlir

mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-analysis-empty-tensor-elimination.mlir

mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-empty-tensor-elimination.mlir

mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-partial.mlir

mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize-allow-return-allocs.mlir

mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize-analysis.mlir

mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize-invalid.mlir

mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize.mlir

mlir/test/Dialect/Bufferization/Transforms/transform-ops.mlir

mlir/test/Dialect/GPU/transform-gpu-failing.mlir

mlir/test/Dialect/LLVM/transform-e2e.mlir

mlir/test/Dialect/Linalg/affine.mlir

mlir/test/Dialect/Linalg/bubble-up-extract-slice-op.mlir

mlir/test/Dialect/Linalg/bufferize.mlir

mlir/test/Dialect/Linalg/canonicalize.mlir

mlir/test/Dialect/Linalg/drop-unit-extent-dims.mlir

mlir/test/Dialect/Linalg/erase-unused-operands-and-results.mlir

mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir

mlir/test/Dialect/Linalg/generalize-named-ops.mlir

mlir/test/Dialect/Linalg/generalize-named-polymorphic-ops.mlir

mlir/test/Dialect/Linalg/generalize-pad-tensor.mlir

mlir/test/Dialect/Linalg/invalid.mlir

mlir/test/Dialect/Linalg/named-ops.mlir

mlir/test/Dialect/Linalg/namedop_conversion.mlir

mlir/test/Dialect/Linalg/one-shot-bufferize-analysis-2fill-extract-matmul-all-perms.mlir

mlir/test/Dialect/Linalg/one-shot-bufferize.mlir

mlir/test/Dialect/Linalg/reshape_control_fusion.mlir

mlir/test/Dialect/Linalg/resolve-shaped-type-result-dims.mlir

mlir/test/Dialect/Linalg/roundtrip.mlir

mlir/test/Dialect/Linalg/swap-extract-slice-with-fill.mlir

mlir/test/Dialect/Linalg/tile-and-fuse-tensors.mlir

mlir/test/Dialect/Linalg/tile-tensors.mlir

mlir/test/Dialect/Linalg/tile-to-foreach-thread.mlir

mlir/test/Dialect/Linalg/transform-op-decompose.mlir

mlir/test/Dialect/Linalg/transform-op-fuse-into-containing.mlir

mlir/test/Dialect/Linalg/transform-op-fuse.mlir

mlir/test/Dialect/Linalg/transform-op-generalize.mlir

mlir/test/Dialect/Linalg/transform-op-interchange.mlir

mlir/test/Dialect/Linalg/transform-op-multitile-sizes.mlir

mlir/test/Dialect/Linalg/transform-op-pad.mlir

mlir/test/Dialect/Linalg/transform-op-scalarize.mlir

mlir/test/Dialect/Linalg/transform-op-split-reduction-by-scaling.mlir

mlir/test/Dialect/Linalg/transform-op-split-reduction.mlir

mlir/test/Dialect/Linalg/transform-op-tile.mlir

mlir/test/Dialect/Linalg/transform-op-vectorize.mlir

mlir/test/Dialect/Linalg/transform-tile-and-fuse.mlir

mlir/test/Dialect/Linalg/transform-tile-reduction.mlir

mlir/test/Dialect/Linalg/vectorization.mlir

mlir/test/Dialect/SCF/one-shot-bufferize-analysis.mlir

mlir/test/Dialect/SCF/one-shot-bufferize.mlir

mlir/test/Dialect/SparseTensor/sparse_expand.mlir

mlir/test/Dialect/SparseTensor/sparse_fill_zero.mlir

mlir/test/Dialect/SparseTensor/sparse_kernels.mlir

mlir/test/Dialect/SparseTensor/sparse_matmul_codegen.mlir

mlir/test/Dialect/Tensor/one-shot-bufferize.mlir

mlir/test/Dialect/Transform/selective-targeting.mlir

mlir/test/Dialect/Vector/transform-vector.mlir

mlir/test/Integration/Dialect/Linalg/CPU/test-one-shot-bufferize.mlir

mlir/test/Integration/Dialect/Linalg/CPU/test-tensor-matmul.mlir

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_conv_1d_nwc_wcf.mlir

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_conv_2d.mlir

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_conv_2d_nhwc_hwcf.mlir

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_conv_3d.mlir

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_conv_3d_ndhwc_dhwcf.mlir

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_dot.mlir

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_expand.mlir

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_filter_conv2d.mlir

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_matmul.mlir

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_quantized_matmul.mlir

mlir/test/Interfaces/TilingInterface/tile-and-fuse-using-interface.mlir

mlir/test/Interfaces/TilingInterface/tile-using-interface.mlir

[mlir][linalg] Omit printing result types for named ops.
Needs RevisionPublic