This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Vector/IR/
-
mlir/
-
Dialect/
-
Vector/
-
IR/
4/5
VectorOps.td
-
lib/Conversion/
-
Conversion/
-
VectorToLLVM/
3/4
ConvertVectorToLLVM.cpp
-
VectorToSCF/
4/5
VectorToSCF.cpp
-
test/
-
Conversion/
-
VectorToLLVM/
-
vector-to-llvm.mlir
-
VectorToSCF/
6/6
vector-to-scf.mlir
-
Integration/Dialect/
-
Dialect/
-
Arith/CPU/
-
CPU/
-
test-wide-int-emulation-compare-results-i16.mlir
-
test-wide-int-emulation-constants-i16.mlir
-
LLVMIR/CPU/
-
CPU/
-
X86/
1/2
test-inline-asm-vector.mlir
-
test-vp-intrinsic.mlir
-
Vector/
-
CPU/
-
ArmSVE/
-
test-sve.mlir
-
X86Vector/
-
test-dot.mlir
-
test-mask-compress.mlir
-
test-rsqrt.mlir
-
test-vp2intersect-i32.mlir
-
test-0-d-vectors.mlir
-
test-broadcast.mlir
2/2
test-compress.mlir
-
test-constant-mask.mlir
-
test-contraction.mlir
-
test-create-mask-v4i1.mlir
-
test-create-mask.mlir
-
test-expand.mlir
-
test-extract-strided-slice.mlir
-
test-flat-transpose-col.mlir
-
test-flat-transpose-row.mlir
-
test-fma.mlir
-
test-gather.mlir
-
test-index-vectors.mlir
-
test-insert-strided-slice.mlir
-
test-maskedload.mlir
-
test-maskedstore.mlir
-
test-matrix-multiply-col.mlir
-
test-matrix-multiply-row.mlir
-
test-outerproduct-f32.mlir
-
test-outerproduct-i64.mlir
-
test-print-fp.mlir
-
test-print-int.mlir
-
test-realloc.mlir
-
test-reductions-f32-reassoc.mlir
-
test-reductions-f32.mlir
-
test-reductions-f64-reassoc.mlir
-
test-reductions-f64.mlir
-
test-reductions-i32.mlir
-
test-reductions-i4.mlir
-
test-reductions-i64.mlir
-
test-reductions-si4.mlir
-
test-reductions-ui4.mlir
-
test-scan.mlir
-
test-scatter.mlir
-
test-shape-cast.mlir
-
test-shuffle.mlir
-
test-shuffle16x16.mlir
-
test-sparse-dot-matvec.mlir
-
test-sparse-saxpy-jagged-matvec.mlir
-
test-transpose.mlir
-
GPU/CUDA/
-
CUDA/
-
test-reduction-distribute.mlir
-
test-warp-distribute.mlir
-
mlir-cpu-runner/
-
math-polynomial-approx.mlir
-
test-expand-math-approx.mlir

Differential D156519

[mlir][VectorOps] Use SCF for vector.print and allow scalable vectors
ClosedPublic

Authored by benmxwl-arm on Jul 28 2023, 3:48 AM.

Download Raw Diff

Details

Reviewers

awarzynski
c-rhodes
MatsPetersson
aartbik
ftynse
dcaballe
kuhar
nicolasvasilache
herhut
rriddle

Commits

rGf36e909da037: [mlir][VectorOps] Use SCF for vector.print and allow scalable vectors
rG490dae26cb3b: [mlir][VectorOps] Use SCF for vector.print and allow scalable vectors
rG3875804a0725: [mlir][VectorOps] Use SCF for vector.print and allow scalable vectors

Summary

This patch splits the lowering of vector.print into first converting
an n-D print into a loop of scalar prints of the elements, then a second
pass that converts those scalar prints into the runtime calls. The
former is done in VectorToSCF and the latter in VectorToLLVM.

The main reason for this is to allow printing scalable vector types,
which are not possible to fully unroll at compile time, though this
also avoids fully unrolling very large vectors.

To allow VectorToSCF to add the necessary punctuation between vectors
and elements, a "punctuation" attribute has been added to vector.print.
This abstracts calling the runtime functions such as printNewline(),
without leaking the LLVM details into the higher abstraction levels.
For example:

vector.print <comma>

lowers to

llvm.call @printComma() : () -> ()

The output format and runtime functions remain the same, which avoids
the need to alter a large number of tests (aside from the pipelines).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

benmxwl-arm created this revision.Jul 28 2023, 3:48 AM

Herald added a reviewer: aartbik. · View Herald TranscriptJul 28 2023, 3:48 AM

Herald added a reviewer: ftynse. · View Herald Transcript

Herald added a reviewer: aartbik. · View Herald Transcript

Herald added a reviewer: dcaballe. · View Herald Transcript

Herald added subscribers: gysit, Dinistro, bviyer and 27 others. · View Herald Transcript

Herald added a reviewer: kuhar. · View Herald TranscriptJul 28 2023, 3:48 AM

Herald added a project: Restricted Project. · View Herald Transcript

benmxwl-arm requested review of this revision.Jul 28 2023, 3:48 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptJul 28 2023, 3:48 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: alextsao1999, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B248802: Diff 545091.Jul 28 2023, 4:11 AM

Rebase, correct comments, and update some missed tests.

Herald added a reviewer: herhut. · View Herald TranscriptJul 28 2023, 5:37 AM

Herald added a subscriber: csigg. · View Herald Transcript

benmxwl-arm edited the summary of this revision. (Show Details)Jul 28 2023, 5:38 AM

benmxwl-arm edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B248815: Diff 545115.Jul 28 2023, 5:56 AM

I can see one test failure:

FAIL: MLIR :: Integration/Dialect/LLVMIR/CPU/X86/test-inline-asm-vector.mlir (210 of 2225)
******************** TEST 'MLIR :: Integration/Dialect/LLVMIR/CPU/X86/test-inline-asm-vector.mlir' FAILED ********************
Script:
--
: 'RUN: at line 1';   /llvm/relass/bin/mlir-opt /llvm/llvm-project/mlir/test/Integration/Dialect/LLVMIR/CPU/X86/test-inline-asm-vector.mlir -convert-vector-to-llvm |   /llvm/relass/bin/mlir-cpu-runner -e entry_point_with_all_constants -entry-point-result=void    -shared-libs=/llvm/relass/lib/libmlir_c_runner_utils.so
--
Exit Code: 1

Command Output (stderr):
--
loc("<stdin>":4:5): error: Dialect `vector' not found for custom op 'vector.print' 
could not parse the input IR

Thanks for patch the Ben, this will improve tests like the one added in D155839 and mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir that print scalable vectors. I've left a few minor comments but otherwise LGTM, aside from the test failure @kuhar mentioned. Please allow time for others to review.

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
1420–1429	nit: splits
mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp
716	nit: a comment here like "shape cast to rank 1" would be helpful
mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir
566	nit: actual names like `IS_NOT_LAST_ITER` would be helpful in the tests

This revision is now accepted and ready to land.Jul 31 2023, 2:58 AM

Fix test-inline-asm-vector.mlir (thanks for the pointer)
Always extend odd integer sizes (non-pow2 or < i8) to avoid backend issues
Use actual names in tests
Fix nits

Herald added a reviewer: rriddle. · View Herald TranscriptJul 31 2023, 7:14 AM

benmxwl-arm marked 3 inline comments as done.Jul 31 2023, 7:17 AM

Harbormaster completed remote builds in B249206: Diff 545643.Jul 31 2023, 8:27 AM

Thanks for working on this @benmxwl-arm, this is a very nice and a very welcome improvement for scalable vectors!

a "punctuation" attribute has been added to vector.print.
This abstracts calling the runtime functions such as printNewline(),
without leaking the LLVM details into the higher abstraction levels.
For example:
vector.print <comma>
lowers to
lvm.call @printComma() : () -> ()

While this avoids leaking the LLVM details higher up, it requires vector.printto evolve into a bit of a "Swiss knife" that can print Vector(s) and other things too :) It would be good to get an opinion from vector.print veterans - perhaps @aartbik or @rriddle ?

mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
2483	Why `Optional`? So that `vector.print` can print punctuation without anything else?
mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp
675	Why not `llvm.call @printComma() : () -> ()`?
mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir
623	How about 2d scalable vectors?
mlir/test/Integration/Dialect/LLVMIR/CPU/X86/test-inline-asm-vector.mlir
6	Which conversion fails if you keep this as `llvm.func`?
mlir/test/Integration/Dialect/Vector/CPU/test-compress.mlir
1	This is a bit unrelated. We should either switch all tests to `-test-lower-to-llvm` or none. IMO, it would be safer to keep the original pipeline. Switching to `-test-lower-to-llvm` could be proposed separately.

While this avoids leaking the LLVM details higher up, it requires vector.printto evolve into a bit of a "Swiss knife" that can print Vector(s) and other things too :) It would be good to get an opinion from vector.print veterans - perhaps aartbik or rriddle ?

Note that vector.print already is a bit of a "swiss knife" as it supports printing scalars already (and is used for non-vector things), so it does not seem like a huge stretch :)

mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
2483	Yes, it has to be able be used alone for the VectorToSCF pass, see the test outputs :)
mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp
675	Because that'd be introducing LLVM specifics before lowering to LLVM, which does not seem right (and is not done elsewhere).
mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir
623	I could add a test for the SCF output here (which I believe is valid), though currently >= 2D scalables can't be lowered any further.
mlir/test/Integration/Dialect/LLVMIR/CPU/X86/test-inline-asm-vector.mlir
6	I may be lowering it incorrectly, but it gets stuck with some `index` types and `cf.br`s that fail to lower to LLVM following something like: `-convert-vector-to-scf -convert-scf-to-cf -convert-cf-to-llvm -convert-vector-to-llvm -convert-arith-to-llvm -reconcile-unrealized-casts`
mlir/test/Integration/Dialect/Vector/CPU/test-compress.mlir
1	It's mainly because these tests use memrefs now, and hand-rolling the pipeline is quite unwieldy (and the original can't be kept anyway).

Note that if I removed the punctuation attribute entire lowering would need to be done in VectorToSCF directly calling the runtime functions, since individual calls to vector.print would add newlines after each element (currently disabled via the punctuation attribute), which would be incorrect when not printing a scalar.

Thanks for adding this. I am quite happy that this (1) support scalable vectors now and (2) the loops instead of unrolling will actually avoid some of the code size issues we were seeing.
I would like to see a bit more explanation in the vector op documentation itself, please.

mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
2483	It feels like the documentation needs to be updated a bit more on this. We now, if I understand correctly, have the "pure vector print that still implies punctuation" and then after lowering, we get into the semantics of printing the punctuation through the attribute and scalars. I understand why you picked this (not having the llvm dep early), but it is something that could use some explanation
2497–2499	can we add another example with punctuation also now?
2500–2501	broken is a bit ambigous, can we use something like decomposed or so?
mlir/include/mlir/IR/BuiltinTypes.td
378 ↗	(On Diff #545643)	this feels like it should be in a separate CL
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
1419–1420	Now that you are here, this is no longer proof-of-concept, but simply "Lowering implementation ...."
mlir/test/Integration/Dialect/Vector/CPU/test-transfer-read-1d.mlir
1 ↗	(On Diff #545643)	this seems to repeat the pass?

Update vector.print documentation
Remove changes to BuiltinTypes.td
Add 2D scalable test case in vector-to-scf.mlir
Address miscellaneous comments :)

benmxwl-arm marked 7 inline comments as done.Aug 1 2023, 3:25 AM

Harbormaster completed remote builds in B249440: Diff 545990.Aug 1 2023, 3:40 AM

aartbik accepted this revision.Aug 1 2023, 12:41 PM

aartbik added inline comments.

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
1441	since you are not using the result of the cast ,just use isa?

awarzynski added inline comments.Aug 2 2023, 1:50 AM

mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir
623	To me that's a good reason to completely disallow 2D scalables. If we are not going to use it, it's effectively dead code 🤔 . This way we are much clearer about what's supported and what isn't.

benmxwl-arm added inline comments.Aug 2 2023, 2:18 AM

mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir
623	It does not require extra code to support here (it comes as a part of already supporting n-D vectors + scalable vectors). It's just that currently 2D+ scalable vectors can't be lowered to LLVM, so the lowering actually fails before any specific to `vector.print` (e.g. at the `arith.constant dense<0.0> : vector<[4]x[4]xf32>` in the test case).

LGTM, thanks! (% request for a few more comments)

mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp
682–683	I am realising that "dynamic" may mean different thing depending on the context. I usually think of "dynamically" vs "statically" shaped tensors/memrefs. But that's not what you had in mind, is it? Could you elaborate a bit? In particular, what the actual limitation is? With e.g. references to vector.extract vs vector.extractelement? You may also want to refer to: https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops/, and https://reviews.llvm.org/D155034.
mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir
623	This is a good point, but I would still be hesitant to allow `vector.print %arg0 : vector<[4]x[4]xf32>` anywhere. We know that it will never be lowered to anything useful (well, there's nothing on the horizon). In cases like this I try to follow the principle of least surprises :) Having said that, I appreciate that it's a bit weird to impose such limitations at the Vector dialect level. Please go with whichever approach you prefer.

Thanks for the reviews and approvals, however, I've decided to rework the patch a little.
Rather than have the spill to a memref, I now do a simple reshape of n-D vectors (which then allows indexing with SSA values via vector.extractelement).
This explicitly does not support 2D scalable vectors, but as those can't be lowered currently this is no loss.

The main advantage of this is it keeps the lowering pipeline simple, so I've now removed the few -test-lower-to-llvms I added.

This did however uncover a bug in the vector.extract folds, so this patch now depends on D157003.

benmxwl-arm marked 5 inline comments as done.Aug 3 2023, 7:41 AM

benmxwl-arm added a parent revision: D157003: [mlir][VectorOps] Fix folding of vector.extract from stretch vector.broadcast.

Harbormaster completed remote builds in B250066: Diff 546863.Aug 3 2023, 9:13 AM

Fix an accidental use-after-free
Rebase on main

Harbormaster completed remote builds in B250326: Diff 547220.Aug 4 2023, 8:42 AM

Thanks for the reviews and approvals, however, I've decided to rework the patch a little.

Fantastic refactor, thank you Ben! This simplifies your original implementation and makes it less intrusive (thinking about the tests). I've left one small comment, but otherwise re-LGTM!

Btw, as this latest update doesn't really affect the overall direction/logic (which has already been approved by 3 reviewers), I think that it's fine to land it without waiting for further re-approvals ;-)

mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp
721–728	I would bump this to the very top. In general, it's good to have this sort of assumptions documented and enforced quite early on (as opposed mixed with regular code). Not always possible, but should be fine in this case.

Closed by commit rG3875804a0725: [mlir][VectorOps] Use SCF for vector.print and allow scalable vectors (authored by benmxwl-arm). · Explain WhyAug 9 2023, 2:39 AM

This revision was automatically updated to reflect the committed changes.

benmxwl-arm added a commit: rG3875804a0725: [mlir][VectorOps] Use SCF for vector.print and allow scalable vectors.

benmxwl-arm added a reverting change: rGb160442dd2ca: Revert "[mlir][VectorOps] Use SCF for vector.print and allow scalable vectors".Aug 9 2023, 2:55 AM

Reverted due to test failures in the MLIR python bindings: https://lab.llvm.org/buildbot/#/builders/220/builds/25791
Will address those before re-landing.

benmxwl-arm added a commit: rG490dae26cb3b: [mlir][VectorOps] Use SCF for vector.print and allow scalable vectors.Aug 9 2023, 4:47 AM

Relanded after updating the few test failures I could find (Python bindings and a few CUDA/GPU tests), I've tried to go over all the tests I can find (though I am mindful there's hardware tested I can't run).

mehdi_amini added a reverting change: rG1b272d21c816: Revert "[mlir][VectorOps] Use SCF for vector.print and allow scalable vectors".Aug 9 2023, 7:37 PM

Reverted because of the broken test, you tried to fix it as a follow up but that didn't work: it actually made it worse. The fix wasn't correct anyway, the problem is a parsing error where vector.print <open> wouldn't parse back.

$ echo 'vector.print <open>' | bin/mlir-opt
<stdin>:1:13: error: expected operation name in quotes
vector.print <open>
            ^

mehdi_amini added inline comments.Aug 9 2023, 7:56 PM

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
1436–1437	There isn't necessarily a parent module op I believe.

Sorry for the inconvenience and thanks for reverting (sorry I missed the continued failures).

I see the issue with the ambiguous assembly format and have a fix for that.
I'm just now trying to ensure I can at least run the pipeline for gpu-to-cubin locally to verify it will build.

Hi @mehdi_amini,

In D156519#4575277, @mehdi_amini wrote:

Reverted because of the broken test

Sorry about the inconvenience.

We are actually struggling to reproduce the failure in gpu-to-cubin.mlir. These CUDA integration tests use pipelines (e.g. gpu-to-cubin) that just won't build without CUDA runtime/drivers (which we don't have). Perhaps we're missing something obvious, but so far this is proving quite tricky.

Do you know anyone able to and willing to share an mlir-print-ir-after-all dump with us? :) Perhaps there are some instructions somewhere on how to run these tests using free tools? Just to clarify - we only want to/need to compile these tests. We don't need to run them (i.e. with mlir-cpu-runner).

@benmxwl-arm will shortly update this PR with a fix for the ASM format, but "gpu-to-cubin.mlir" is likely to continue failing. I'm not sure how to resolve that without some external help.

benmxwl-arm reopened this revision.Aug 10 2023, 7:00 AM

This revision is now accepted and ready to land.Aug 10 2023, 7:00 AM

Check parent module exists (required for print, note that getting the parent this way is not changed from the previous implementation)
Ensure assembly format is round-trippable
Best guess fix for gpu-to-cubin.mlir (no luck testing this locally)

Herald added a subscriber: hanchung. · View Herald TranscriptAug 10 2023, 7:01 AM

Harbormaster completed remote builds in B251683: Diff 549029.Aug 10 2023, 12:21 PM

We are actually struggling to reproduce the failure in gpu-to-cubin.mlir. These CUDA integration tests use pipelines (e.g. gpu-to-cubin) that just won't build without CUDA runtime/drivers (which we don't have). Perhaps we're missing something obvious, but so far this is proving quite tricky.

You can't run them, however you should be able to compile everything (assuming you enabled the NVPTX target).
You can just copy the test and invoke it from the regular test locations (the Integration Cuda test will be filtered if you don't have Cuda) or just call mlir-opt directly, for example there shouldn't be any issue doing:

bin/mlir-opt /mlir/test/Integration/GPU/CUDA/gpu-to-cubin.mlir  \
| bin/mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm,gpu-to-cubin))' \
| bin/mlir-opt -gpu-to-llvm

(this the test RUN description)

Patched this revision and ran locally:

PASS: MLIR :: Integration/GPU/CUDA/gpu-to-cubin.mlir (166 of 2066)

Just try to push at a time you can keep an eye on the bot if you wanna be safe!

Sorry for the inconvenience and thanks for reverting (sorry I missed the continued failures).

It's very easy to miss, because the bot won't send a notification when it is already broken, only when it goes from green to red.

Thanks a lot for the help, and checking that test! I'll definitely be keeping a close eye this time.

The extra complication with the gpu-to-cubin pass is that it's gated under MLIR_ENABLE_CUDA_RUNNER and links with the CUDA libraries, which we didn't have.

It's very easy to miss, because the bot won't send a notification when it is already broken, only when it goes from green to red.

See that now 😅, I was expecting a notification.

Closed by commit rGf36e909da037: [mlir][VectorOps] Use SCF for vector.print and allow scalable vectors (authored by benmxwl-arm). · Explain WhyAug 11 2023, 2:30 AM

This revision was automatically updated to reflect the committed changes.

benmxwl-arm added a commit: rGf36e909da037: [mlir][VectorOps] Use SCF for vector.print and allow scalable vectors.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Vector/

IR/

VectorOps.td

67 lines

lib/

Conversion/

VectorToLLVM/

ConvertVectorToLLVM.cpp

185 lines

VectorToSCF/

VectorToSCF.cpp

175 lines

test/

Conversion/

VectorToLLVM/

vector-to-llvm.mlir

51 lines

VectorToSCF/

vector-to-scf.mlir

149 lines

Integration/

Dialect/

Arith/

CPU/

test-wide-int-emulation-compare-results-i16.mlir

5 lines

test-wide-int-emulation-constants-i16.mlir

4 lines

LLVMIR/

CPU/

X86/

test-inline-asm-vector.mlir

18 lines

test-vp-intrinsic.mlir

5 lines

Vector/

CPU/

ArmSVE/

test-sve.mlir

2 lines

X86Vector/

test-dot.mlir

2 lines

test-mask-compress.mlir

2 lines

test-rsqrt.mlir

2 lines

test-vp2intersect-i32.mlir

2 lines

test-0-d-vectors.mlir

2 lines

test-broadcast.mlir

2 lines

test-compress.mlir

2 lines

test-constant-mask.mlir

2 lines

test-contraction.mlir

2 lines

test-create-mask-v4i1.mlir

2 lines

test-create-mask.mlir

2 lines

test-expand.mlir

2 lines

test-extract-strided-slice.mlir

2 lines

test-flat-transpose-col.mlir

2 lines

test-flat-transpose-row.mlir

2 lines

test-fma.mlir

2 lines

test-gather.mlir

2 lines

test-index-vectors.mlir

2 lines

test-insert-strided-slice.mlir

2 lines

test-maskedload.mlir

2 lines

test-maskedstore.mlir

2 lines

test-matrix-multiply-col.mlir

2 lines

test-matrix-multiply-row.mlir

2 lines

test-outerproduct-f32.mlir

2 lines

test-outerproduct-i64.mlir

2 lines

test-print-fp.mlir

2 lines

test-print-int.mlir

2 lines

test-realloc.mlir

4 lines

test-reductions-f32-reassoc.mlir

2 lines

test-reductions-f32.mlir

2 lines

test-reductions-f64-reassoc.mlir

2 lines

test-reductions-f64.mlir

2 lines

test-reductions-i32.mlir

2 lines

test-reductions-i4.mlir

2 lines

test-reductions-i64.mlir

2 lines

test-reductions-si4.mlir

2 lines

test-reductions-ui4.mlir

2 lines

2 lines

2 lines

2 lines

2 lines

test-shuffle16x16.mlir

5 lines

test-sparse-dot-matvec.mlir

2 lines

test-sparse-saxpy-jagged-matvec.mlir

2 lines

test-transpose.mlir

2 lines

GPU/

CUDA/

test-reduction-distribute.mlir

2 lines

test-warp-distribute.mlir

6 lines

mlir-cpu-runner/

math-polynomial-approx.mlir

2 lines

test-expand-math-approx.mlir

2 lines

Diff 545990

mlir/include/mlir/Dialect/Vector/IR/VectorOps.td

Show First 20 Lines • Show All 2,456 Lines • ▼ Show 20 Lines	def Vector_TransposeOp :
let assemblyFormat = [{		let assemblyFormat = [{
$vector `,` $transp attr-dict `:` type($vector) `to` type($result)		$vector `,` $transp attr-dict `:` type($vector) `to` type($result)
}];		}];
let hasCanonicalizer = 1;		let hasCanonicalizer = 1;
let hasFolder = 1;		let hasFolder = 1;
let hasVerifier = 1;		let hasVerifier = 1;
}		}

		def PrintPunctuation : I32EnumAttr<"PrintPunctuation",
		"Punctuation for separating vectors or vector elements", [
		I32EnumAttrCase<"NoPunctuation", 0, "no_punctuation">,
		I32EnumAttrCase<"NewLine", 1, "newline">,
		I32EnumAttrCase<"Comma", 2, "comma">,
		I32EnumAttrCase<"Open", 3, "open">,
		I32EnumAttrCase<"Close", 4, "close">
		]> {
		let cppNamespace = "::mlir::vector";
		let genSpecializedAttr = 0;
		}

		def Vector_PrintPunctuation : EnumAttr<Vector_Dialect, PrintPunctuation, "punctuation"> {
		let assemblyFormat = "`<` $value `>`";
		}

def Vector_PrintOp :		def Vector_PrintOp :
Vector_Op<"print", []>,		Vector_Op<"print", []>,
Arguments<(ins Type<Or<[		Arguments<(ins Optional<Type<Or<[
		awarzynskiUnsubmitted Not Done Reply Inline Actions Why `Optional`? So that `vector.print` can print punctuation without anything else? awarzynski: Why `Optional`? So that `vector.print` can print punctuation without anything else?
		benmxwl-armAuthorUnsubmitted Done Reply Inline Actions Yes, it has to be able be used alone for the VectorToSCF pass, see the test outputs :) benmxwl-arm: Yes, it has to be able be used alone for the VectorToSCF pass, see the test outputs :)
		aartbikUnsubmitted Done Reply Inline Actions It feels like the documentation needs to be updated a bit more on this. We now, if I understand correctly, have the "pure vector print that still implies punctuation" and then after lowering, we get into the semantics of printing the punctuation through the attribute and scalars. I understand why you picked this (not having the llvm dep early), but it is something that could use some explanation aartbik: It feels like the documentation needs to be updated a bit more on this. We now, if I understand…
AnyVectorOfAnyRank.predicate,		AnyVectorOfAnyRank.predicate,
AnyInteger.predicate, Index.predicate, AnyFloat.predicate		AnyInteger.predicate, Index.predicate, AnyFloat.predicate
]>>:$source)> {		]>>>:$source, DefaultValuedAttr<Vector_PrintPunctuation,
		"::mlir::vector::PrintPunctuation::NewLine">:$punctuation)
		> {
let summary = "print operation (for testing and debugging)";		let summary = "print operation (for testing and debugging)";
let description = [{		let description = [{
Prints the source vector (or scalar) to stdout in human readable		Prints the source vector (or scalar) to stdout in a human-readable format
format (for testing and debugging). No return value.		(for testing and debugging). No return value.

Example:		Example:

```mlir		```mlir
%0 = arith.constant 0.0 : f32		%v = arith.constant dense<0.0> : vector<4xf32>
%1 = vector.broadcast %0 : f32 to vector<4xf32>		vector.print %v : vector<4xf32>
vector.print %1 : vector<4xf32>		```
		aartbikUnsubmitted Done Reply Inline Actions can we add another example with punctuation also now? aartbik: can we add another example with punctuation also now?

when lowered to LLVM, the vector print is unrolled into		When lowered to LLVM, the vector print is decomposed into elementary
		aartbikUnsubmitted Done Reply Inline Actions broken is a bit ambigous, can we use something like decomposed or so? aartbik: broken is a bit ambigous, can we use something like decomposed or so?
elementary printing method calls that at runtime will yield		printing method calls that at runtime will yield:

		```
( 0.0, 0.0, 0.0, 0.0 )		( 0.0, 0.0, 0.0, 0.0 )
		```

on stdout when linked with a small runtime support library,		This is printed to stdout via a small runtime support library, which only
which only needs to provide a few printing methods (single		needs to provide a few printing methods (single value for all data
value for all data types, opening/closing bracket, comma,		types, opening/closing bracket, comma, newline).
newline).
		By default `vector.print` adds a newline after the vector, but this can be
		controlled by the `punctuation` attribute. For example, to print a comma
		after instead do:

		```mlir
		vector.print %v : vector<4xf32> #vector.punctuation<comma>
		```

		Note that it is possible to use the punctuation attribute alone. The
		following will print a single newline:

		```mlir
		vector.print #vector.punctuation<newline>
```		```
}];		}];
let extraClassDeclaration = [{		let extraClassDeclaration = [{
Type getPrintType() {		Type getPrintType() {
return getSource().getType();		return getSource().getType();
}		}
}];		}];
let assemblyFormat = "$source attr-dict `:` type($source)";		let builders = [
		OpBuilder<(ins "PrintPunctuation":$punctuation), [{
		build($_builder, $_state, {}, punctuation);
		}]>,
		];

		let assemblyFormat = "($source^ `:` type($source))? ($punctuation^)? attr-dict";
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Ops used for supporting progressive lowering and conversion type changes.		// Ops used for supporting progressive lowering and conversion type changes.
// The Ops are typically not used directly by higher level dialects, but are		// The Ops are typically not used directly by higher level dialects, but are
// used by intra-dialect rewriting rules to bring vector operations closer		// used by intra-dialect rewriting rules to bring vector operations closer
// to the hardware ISA.		// to the hardware ISA.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 389 Lines • Show Last 20 Lines

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp

Show All 22 Lines
#include "mlir/Target/LLVMIR/TypeToLLVM.h"		#include "mlir/Target/LLVMIR/TypeToLLVM.h"
#include "mlir/Transforms/DialectConversion.h"		#include "mlir/Transforms/DialectConversion.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include <optional>		#include <optional>

using namespace mlir;		using namespace mlir;
using namespace mlir::vector;		using namespace mlir::vector;

// Helper to reduce vector type by one rank at front.
static VectorType reducedVectorTypeFront(VectorType tp) {
assert((tp.getRank() > 1) && "unlowerable vector type");
return VectorType::get(tp.getShape().drop_front(), tp.getElementType(),
tp.getScalableDims().drop_front());
}

// Helper to reduce vector type by all but one rank at back.		// Helper to reduce vector type by all but one rank at back.
static VectorType reducedVectorTypeBack(VectorType tp) {		static VectorType reducedVectorTypeBack(VectorType tp) {
assert((tp.getRank() > 1) && "unlowerable vector type");		assert((tp.getRank() > 1) && "unlowerable vector type");
return VectorType::get(tp.getShape().take_back(), tp.getElementType(),		return VectorType::get(tp.getShape().take_back(), tp.getElementType(),
tp.getScalableDims().take_back());		tp.getScalableDims().take_back());
}		}

// Helper that picks the proper sequence for inserting.		// Helper that picks the proper sequence for inserting.
▲ Show 20 Lines • Show All 1,372 Lines • ▼ Show 20 Lines

private:		private:
const bool force32BitVectorIndices;		const bool force32BitVectorIndices;
};		};

class VectorPrintOpConversion : public ConvertOpToLLVMPattern<vector::PrintOp> {		class VectorPrintOpConversion : public ConvertOpToLLVMPattern<vector::PrintOp> {
public:		public:
using ConvertOpToLLVMPattern<vector::PrintOp>::ConvertOpToLLVMPattern;		using ConvertOpToLLVMPattern<vector::PrintOp>::ConvertOpToLLVMPattern;

// Proof-of-concept lowering implementation that relies on a small		// Lowering implementation that relies on a small runtime support library,
		aartbikUnsubmitted Done Reply Inline Actions Now that you are here, this is no longer proof-of-concept, but simply "Lowering implementation ...." aartbik: Now that you are here, this is no longer proof-of-concept, but simply "Lowering implementation .
// runtime support library, which only needs to provide a few		// which only needs to provide a few printing methods (single value for all
// printing methods (single value for all data types, opening/closing		// data types, opening/closing bracket, comma, newline). The lowering splits
// bracket, comma, newline). The lowering fully unrolls a vector		// the vector into elementary printing operations. The advantage of this
// in terms of these elementary printing operations. The advantage		// approach is that the library can remain unaware of all low-level
// of this approach is that the library can remain unaware of all		// implementation details of vectors while still supporting output of any
// low-level implementation details of vectors while still supporting		// shaped and dimensioned vector.
// output of any shaped and dimensioned vector. Due to full unrolling,		//
// this approach is less suited for very large vectors though.		// Note: This lowering only handles scalars, n-D vectors are broken into
		// printing scalars in loops in VectorToSCF.
		c-rhodesUnsubmitted Done Reply Inline Actions nit: splits c-rhodes: nit: splits
//		//
// TODO: rely solely on libc in future? something else?		// TODO: rely solely on libc in future? something else?
//		//
LogicalResult		LogicalResult
matchAndRewrite(vector::PrintOp printOp, OpAdaptor adaptor,		matchAndRewrite(vector::PrintOp printOp, OpAdaptor adaptor,
ConversionPatternRewriter &rewriter) const override {		ConversionPatternRewriter &rewriter) const override {
		auto parent = printOp->getParentOfType<ModuleOp>();
		auto loc = printOp->getLoc();
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions There isn't necessarily a parent module op I believe. mehdi_amini: There isn't necessarily a parent module op I believe.

		if (auto value = adaptor.getSource()) {
Type printType = printOp.getPrintType();		Type printType = printOp.getPrintType();
		if (dyn_cast<VectorType>(printType)) {
		aartbikUnsubmitted Done Reply Inline Actions since you are not using the result of the cast ,just use isa? aartbik: since you are not using the result of the cast ,just use isa?
		// Vectors should be broken into elementary print ops in VectorToSCF.
		return failure();
		}
		if (failed(emitScalarPrint(rewriter, parent, loc, printType, value)))
		return failure();
		}

		auto punct = printOp.getPunctuation();
		if (punct != PrintPunctuation::NoPunctuation) {
		emitCall(rewriter, printOp->getLoc(), [&] {
		switch (punct) {
		case PrintPunctuation::Close:
		return LLVM::lookupOrCreatePrintCloseFn(parent);
		case PrintPunctuation::Open:
		return LLVM::lookupOrCreatePrintOpenFn(parent);
		case PrintPunctuation::Comma:
		return LLVM::lookupOrCreatePrintCommaFn(parent);
		case PrintPunctuation::NewLine:
		return LLVM::lookupOrCreatePrintNewlineFn(parent);
		default:
		llvm_unreachable("unexpected punctuation");
		}
		}());
		}

		rewriter.eraseOp(printOp);
		return success();
		}

		private:
		enum class PrintConversion {
		// clang-format off
		None,
		ZeroExt64,
		SignExt64,
		Bitcast16
		// clang-format on
		};

		LogicalResult emitScalarPrint(ConversionPatternRewriter &rewriter,
		ModuleOp parent, Location loc, Type printType,
		Value value) const {
if (typeConverter->convertType(printType) == nullptr)		if (typeConverter->convertType(printType) == nullptr)
return failure();		return failure();

// Make sure element type has runtime support.		// Make sure element type has runtime support.
PrintConversion conversion = PrintConversion::None;		PrintConversion conversion = PrintConversion::None;
VectorType vectorType = dyn_cast<VectorType>(printType);
Type eltType = vectorType ? vectorType.getElementType() : printType;
auto parent = printOp->getParentOfType<ModuleOp>();
Operation *printer;		Operation *printer;
if (eltType.isF32()) {		if (printType.isF32()) {
printer = LLVM::lookupOrCreatePrintF32Fn(parent);		printer = LLVM::lookupOrCreatePrintF32Fn(parent);
} else if (eltType.isF64()) {		} else if (printType.isF64()) {
printer = LLVM::lookupOrCreatePrintF64Fn(parent);		printer = LLVM::lookupOrCreatePrintF64Fn(parent);
} else if (eltType.isF16()) {		} else if (printType.isF16()) {
conversion = PrintConversion::Bitcast16; // bits!		conversion = PrintConversion::Bitcast16; // bits!
printer = LLVM::lookupOrCreatePrintF16Fn(parent);		printer = LLVM::lookupOrCreatePrintF16Fn(parent);
} else if (eltType.isBF16()) {		} else if (printType.isBF16()) {
conversion = PrintConversion::Bitcast16; // bits!		conversion = PrintConversion::Bitcast16; // bits!
printer = LLVM::lookupOrCreatePrintBF16Fn(parent);		printer = LLVM::lookupOrCreatePrintBF16Fn(parent);
} else if (eltType.isIndex()) {		} else if (printType.isIndex()) {
printer = LLVM::lookupOrCreatePrintU64Fn(parent);		printer = LLVM::lookupOrCreatePrintU64Fn(parent);
} else if (auto intTy = dyn_cast<IntegerType>(eltType)) {		} else if (auto intTy = dyn_cast<IntegerType>(printType)) {
// Integers need a zero or sign extension on the operand		// Integers need a zero or sign extension on the operand
// (depending on the source type) as well as a signed or		// (depending on the source type) as well as a signed or
// unsigned print method. Up to 64-bit is supported.		// unsigned print method. Up to 64-bit is supported.
unsigned width = intTy.getWidth();		unsigned width = intTy.getWidth();
if (intTy.isUnsigned()) {		if (intTy.isUnsigned()) {
if (width <= 64) {		if (width <= 64) {
if (width < 64)		if (width < 64)
conversion = PrintConversion::ZeroExt64;		conversion = PrintConversion::ZeroExt64;
Show All 14 Lines	if (printType.isF32()) {
} else {		} else {
return failure();		return failure();
}		}
}		}
} else {		} else {
return failure();		return failure();
}		}

// Unroll vector into elementary print calls.
int64_t rank = vectorType ? vectorType.getRank() : 0;
Type type = vectorType ? vectorType : eltType;
emitRanks(rewriter, printOp, adaptor.getSource(), type, printer, rank,
conversion);
emitCall(rewriter, printOp->getLoc(),
LLVM::lookupOrCreatePrintNewlineFn(parent));
rewriter.eraseOp(printOp);
return success();
}

private:
enum class PrintConversion {
// clang-format off
None,
ZeroExt64,
SignExt64,
Bitcast16
// clang-format on
};

void emitRanks(ConversionPatternRewriter &rewriter, Operation *op,
Value value, Type type, Operation *printer, int64_t rank,
PrintConversion conversion) const {
VectorType vectorType = dyn_cast<VectorType>(type);
Location loc = op->getLoc();
if (!vectorType) {
assert(rank == 0 && "The scalar case expects rank == 0");
switch (conversion) {		switch (conversion) {
case PrintConversion::ZeroExt64:		case PrintConversion::ZeroExt64:
value = rewriter.create<arith::ExtUIOp>(		value = rewriter.create<arith::ExtUIOp>(
loc, IntegerType::get(rewriter.getContext(), 64), value);		loc, IntegerType::get(rewriter.getContext(), 64), value);
break;		break;
case PrintConversion::SignExt64:		case PrintConversion::SignExt64:
value = rewriter.create<arith::ExtSIOp>(		value = rewriter.create<arith::ExtSIOp>(
loc, IntegerType::get(rewriter.getContext(), 64), value);		loc, IntegerType::get(rewriter.getContext(), 64), value);
break;		break;
case PrintConversion::Bitcast16:		case PrintConversion::Bitcast16:
value = rewriter.create<LLVM::BitcastOp>(		value = rewriter.create<LLVM::BitcastOp>(
loc, IntegerType::get(rewriter.getContext(), 16), value);		loc, IntegerType::get(rewriter.getContext(), 16), value);
break;		break;
case PrintConversion::None:		case PrintConversion::None:
break;		break;
}		}
emitCall(rewriter, loc, printer, value);		emitCall(rewriter, loc, printer, value);
return;		return success();
}

auto parent = op->getParentOfType<ModuleOp>();
emitCall(rewriter, loc, LLVM::lookupOrCreatePrintOpenFn(parent));
Operation *printComma = LLVM::lookupOrCreatePrintCommaFn(parent);

if (rank <= 1) {
auto reducedType = vectorType.getElementType();
auto llvmType = typeConverter->convertType(reducedType);
int64_t dim = rank == 0 ? 1 : vectorType.getDimSize(0);
for (int64_t d = 0; d < dim; ++d) {
Value nestedVal = extractOne(rewriter, *getTypeConverter(), loc, value,
llvmType, /rank=/0, /pos=/d);
emitRanks(rewriter, op, nestedVal, reducedType, printer, /rank=/0,
conversion);
if (d != dim - 1)
emitCall(rewriter, loc, printComma);
}
emitCall(rewriter, loc, LLVM::lookupOrCreatePrintCloseFn(parent));
return;
}

int64_t dim = vectorType.getDimSize(0);
for (int64_t d = 0; d < dim; ++d) {
auto reducedType = reducedVectorTypeFront(vectorType);
auto llvmType = typeConverter->convertType(reducedType);
Value nestedVal = extractOne(rewriter, *getTypeConverter(), loc, value,
llvmType, rank, d);
emitRanks(rewriter, op, nestedVal, reducedType, printer, rank - 1,
conversion);
if (d != dim - 1)
emitCall(rewriter, loc, printComma);
}
emitCall(rewriter, loc, LLVM::lookupOrCreatePrintCloseFn(parent));
}		}

// Helper to emit a call.		// Helper to emit a call.
static void emitCall(ConversionPatternRewriter &rewriter, Location loc,		static void emitCall(ConversionPatternRewriter &rewriter, Location loc,
Operation *ref, ValueRange params = ValueRange()) {		Operation *ref, ValueRange params = ValueRange()) {
rewriter.create<LLVM::CallOp>(loc, TypeRange(), SymbolRefAttr::get(ref),		rewriter.create<LLVM::CallOp>(loc, TypeRange(), SymbolRefAttr::get(ref),
params);		params);
}		}
▲ Show 20 Lines • Show All 129 Lines • Show Last 20 Lines

mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp

Show First 20 Lines • Show All 645 Lines • ▼ Show 20 Lines	if (xferOp.getMask()) {
xferOp.getMaskMutable().assign(buffers.maskBuffer);		xferOp.getMaskMutable().assign(buffers.maskBuffer);
});		});
}		}

return success();		return success();
}		}
};		};

		/// Decompose a n-D PrintOp into a loop of elementary/scalar prints. This allows
		/// printing both scalable and fixed size vectors.
		///
		/// E.g.:
		/// ```
		/// vector.print %v : vector<[4]xi32>
		/// ```
		/// is rewritten to:
		/// ```
		/// %c0 = arith.constant 0 : index
		/// %c4 = arith.constant 4 : index
		/// %c1 = arith.constant 1 : index
		/// %vscale = vector.vscale
		/// %length = arith.muli %vscale, %c4 : index
		/// %lastIndex = arith.subi %length, %c1 : index
		/// vector.print <open>
		/// scf.for %i = %c0 to %length step %c1 {
		/// %el = vector.extractelement %v[%i : index] : vector<[4]xi32>
		/// vector.print %el : i32 <no_punctuation>
		/// %notLastIndex = arith.cmpi ult, %i, %lastIndex : index
		/// scf.if %notLastIndex {
		/// vector.print <comma>
		awarzynskiUnsubmitted Not Done Reply Inline Actions Why not `llvm.call @printComma() : () -> ()`? awarzynski: Why not `llvm.call @printComma() : () -> ()`?
		benmxwl-armAuthorUnsubmitted Done Reply Inline Actions Because that'd be introducing LLVM specifics before lowering to LLVM, which does not seem right (and is not done elsewhere). benmxwl-arm: Because that'd be introducing LLVM specifics before lowering to LLVM, which does not seem right…
		/// }
		/// }
		/// vector.print <close>
		/// vector.print
		/// ```
		///
		/// Note: A temporary buffer is allocated for rank > 1 vectors to allow dynamic
		/// indexing of elements.
		awarzynskiUnsubmitted Done Reply Inline Actions I am realising that "dynamic" may mean different thing depending on the context. I usually think of "dynamically" vs "statically" shaped tensors/memrefs. But that's not what you had in mind, is it? Could you elaborate a bit? In particular, what the actual limitation is? With e.g. references to vector.extract vs vector.extractelement? You may also want to refer to: https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops/, and https://reviews.llvm.org/D155034. awarzynski: I am realising that "dynamic" may mean different thing depending on the context. I usually…
		struct DecomposePrintOpConversion : public VectorToSCFPattern<vector::PrintOp> {
		using VectorToSCFPattern<vector::PrintOp>::VectorToSCFPattern;
		LogicalResult matchAndRewrite(vector::PrintOp printOp,
		PatternRewriter &rewriter) const override {
		if (!printOp.getSource())
		return failure();

		VectorType vectorType = dyn_cast<VectorType>(printOp.getPrintType());
		if (!vectorType)
		return failure();

		auto loc = printOp.getLoc();
		auto value = printOp.getSource();

		if (auto intTy = dyn_cast<IntegerType>(vectorType.getElementType())) {
		// Oddly sized integers are (somewhat) buggy on a lot of backends, so to
		// avoid issues extend them to a more standard size.
		// https://github.com/llvm/llvm-project/issues/30613
		auto width = intTy.getWidth();
		auto legalWidth = llvm::NextPowerOf2(std::max(8u, width) - 1);
		auto legalIntTy = IntegerType::get(rewriter.getContext(), legalWidth,
		intTy.getSignedness());
		// arith can only take signless integers, so we must cast back and forth.
		auto signlessSourceVectorType =
		vectorType.cloneWith({}, getIntTypeWithSignlessSemantics(intTy));
		auto signlessTargetVectorType =
		vectorType.cloneWith({}, getIntTypeWithSignlessSemantics(legalIntTy));
		auto targetVectorType = vectorType.cloneWith({}, legalIntTy);
		value = rewriter.create<vector::BitCastOp>(loc, signlessSourceVectorType,
		value);
		if (width == 1 \|\| intTy.isUnsigned())
		value = rewriter.create<arith::ExtUIOp>(loc, signlessTargetVectorType,
		value);
		c-rhodesUnsubmitted Done Reply Inline Actions nit: a comment here like "shape cast to rank 1" would be helpful c-rhodes: nit: a comment here like "shape cast to rank 1" would be helpful
		else
		value = rewriter.create<arith::ExtSIOp>(loc, signlessTargetVectorType,
		value);
		value = rewriter.create<vector::BitCastOp>(loc, targetVectorType, value);
		vectorType = targetVectorType;
		}

		Value alloc;
		if (vectorType.getRank() > 1) {
		// Spill vector to allow dynamically indexing elements.
		alloc = vectorToMemref(rewriter, loc, value, vectorType);
		} else if (vectorType.getRank() == 0) {
		awarzynskiUnsubmitted Done Reply Inline Actions I would bump this to the very top. In general, it's good to have this sort of assumptions documented and enforced quite early on (as opposed mixed with regular code). Not always possible, but should be fine in this case. awarzynski: I would bump this to the very top. In general, it's good to have this sort of assumptions…
		// Shape cast rank 0 vectors to rank 1.
		vectorType = VectorType::get({1}, vectorType.getElementType());
		value = rewriter.create<vector::ShapeCastOp>(loc, vectorType, value);
		}

		vector::PrintOp firstClose;
		SmallVector<Value, 8> loopIndices;
		auto shape = vectorType.getShape();
		auto scalableDimensions = vectorType.getScalableDims();
		for (unsigned d = 0; d < shape.size(); d++) {
		// Setup loop bounds and step.
		Value lowerBound = rewriter.create<arith::ConstantIndexOp>(loc, 0);
		Value upperBound = rewriter.create<arith::ConstantIndexOp>(loc, shape[d]);
		Value step = rewriter.create<arith::ConstantIndexOp>(loc, 1);
		if (scalableDimensions[d]) {
		auto vscale = rewriter.create<vector::VectorScaleOp>(
		loc, rewriter.getIndexType());
		upperBound = rewriter.create<arith::MulIOp>(loc, upperBound, vscale);
		}
		auto lastIndex = rewriter.create<arith::SubIOp>(loc, upperBound, step);

		// Create a loop to print the elements surrounded by parentheses.
		rewriter.create<vector::PrintOp>(loc, vector::PrintPunctuation::Open);
		auto loop =
		rewriter.create<scf::ForOp>(loc, lowerBound, upperBound, step);
		auto printClose = rewriter.create<vector::PrintOp>(
		loc, vector::PrintPunctuation::Close);
		if (!firstClose)
		firstClose = printClose;

		auto loopIdx = loop.getInductionVar();
		loopIndices.push_back(loopIdx);

		// Print a comma after all but the last element.
		rewriter.setInsertionPointToStart(loop.getBody());
		auto notLastIndex = rewriter.create<arith::CmpIOp>(
		loc, arith::CmpIPredicate::ult, loopIdx, lastIndex);
		rewriter.create<scf::IfOp>(loc, notLastIndex,
		[&](OpBuilder &builder, Location loc) {
		builder.create<vector::PrintOp>(
		loc, vector::PrintPunctuation::Comma);
		builder.create<scf::YieldOp>(loc);
		});

		rewriter.setInsertionPointToStart(loop.getBody());
		}

		// Print the scalar elements in the inner most loop.
		auto element = [&]() -> Value {
		if (alloc)
		return rewriter.create<memref::LoadOp>(loc, alloc, loopIndices);
		return rewriter.create<vector::ExtractElementOp>(loc, value,
		loopIndices.front());
		}();
		rewriter.create<vector::PrintOp>(loc, element,
		vector::PrintPunctuation::NoPunctuation);

		rewriter.setInsertionPointAfter(firstClose);
		rewriter.create<vector::PrintOp>(loc, printOp.getPunctuation());
		rewriter.eraseOp(printOp);
		return success();
		}

		static Value vectorToMemref(OpBuilder &builder, Location loc, Value value,
		VectorType type) {
		SmallVector<int64_t, 8> memrefShape;
		SmallVector<Value, 8> zeroIndices;
		SmallVector<Value, 8> memrefDynamicDimensions;
		auto shape = type.getShape();
		auto scalableDimensions = type.getScalableDims();
		auto zeroIndex = builder.create<arith::ConstantIndexOp>(loc, 0);
		for (unsigned d = 0; d < shape.size(); d++) {
		if (scalableDimensions[d]) {
		memrefShape.push_back(ShapedType::kDynamic);
		auto vscale =
		builder.create<vector::VectorScaleOp>(loc, builder.getIndexType());
		auto size = builder.create<arith::ConstantIndexOp>(loc, shape[d]);
		auto scaledSize = builder.create<arith::MulIOp>(loc, size, vscale);
		memrefDynamicDimensions.push_back(scaledSize);
		} else {
		memrefShape.push_back(shape[d]);
		}
		zeroIndices.push_back(zeroIndex);
		}
		auto memrefType = MemRefType::get(memrefShape, type.getElementType());
		auto alloc = builder.create<memref::AllocaOp>(loc, memrefType,
		memrefDynamicDimensions);
		builder.create<vector::TransferWriteOp>(loc, value, alloc,
		ValueRange(zeroIndices));
		return alloc;
		}

		static IntegerType getIntTypeWithSignlessSemantics(IntegerType intTy) {
		return IntegerType::get(intTy.getContext(), intTy.getWidth(),
		IntegerType::Signless);
		};
		};

/// Progressive lowering of vector transfer ops: Unpack one dimension.		/// Progressive lowering of vector transfer ops: Unpack one dimension.
///		///
/// 1. Unpack one dimension from the current buffer type and cast the buffer		/// 1. Unpack one dimension from the current buffer type and cast the buffer
/// to that new type. E.g.:		/// to that new type. E.g.:
/// ```		/// ```
/// %vec = memref.load %0[%1] : memref<5xvector<4x3xf32>>		/// %vec = memref.load %0[%1] : memref<5xvector<4x3xf32>>
/// vector.transfer_write %vec ...		/// vector.transfer_write %vec ...
/// ```		/// ```
▲ Show 20 Lines • Show All 613 Lines • ▼ Show 20 Lines	patterns.add<lowering_n_d::PrepareTransferReadConversion,
patterns.getContext(), options);		patterns.getContext(), options);
}		}

if (options.targetRank == 1) {		if (options.targetRank == 1) {
patterns.add<lowering_1_d::TransferOp1dConversion<TransferReadOp>,		patterns.add<lowering_1_d::TransferOp1dConversion<TransferReadOp>,
lowering_1_d::TransferOp1dConversion<TransferWriteOp>>(		lowering_1_d::TransferOp1dConversion<TransferWriteOp>>(
patterns.getContext(), options);		patterns.getContext(), options);
}		}
		patterns.add<lowering_n_d::DecomposePrintOpConversion>(patterns.getContext(),
		options);
}		}

namespace {		namespace {

struct ConvertVectorToSCFPass		struct ConvertVectorToSCFPass
: public impl::ConvertVectorToSCFBase<ConvertVectorToSCFPass> {		: public impl::ConvertVectorToSCFBase<ConvertVectorToSCFPass> {
ConvertVectorToSCFPass() = default;		ConvertVectorToSCFPass() = default;
ConvertVectorToSCFPass(const VectorTransferToSCFOptions &options) {		ConvertVectorToSCFPass(const VectorTransferToSCFOptions &options) {
Show All 30 Lines

mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir

	Show First 20 Lines • Show All 1,038 Lines • ▼ Show 20 Lines
	}			}
	// CHECK-LABEL: @vector_print_scalar_f64(			// CHECK-LABEL: @vector_print_scalar_f64(
	// CHECK-SAME: %[[A:.*]]: f64)			// CHECK-SAME: %[[A:.*]]: f64)
	// CHECK: llvm.call @printF64(%[[A]]) : (f64) -> ()			// CHECK: llvm.call @printF64(%[[A]]) : (f64) -> ()
	// CHECK: llvm.call @printNewline() : () -> ()			// CHECK: llvm.call @printNewline() : () -> ()

	// -----			// -----

	func.func @vector_print_vector_0d(%arg0: vector<f32>) {
	vector.print %arg0 : vector<f32>
	return
	}
	// CHECK-LABEL: @vector_print_vector_0d(
	// CHECK-SAME: %[[A:.*]]: vector<f32>)
	// CHECK: %[[T0:.*]] = builtin.unrealized_conversion_cast %[[A]] : vector<f32> to vector<1xf32>
	// CHECK: llvm.call @printOpen() : () -> ()
	// CHECK: %[[T1:.*]] = llvm.mlir.constant(0 : index) : i64
	// CHECK: %[[T2:.*]] = llvm.extractelement %[[T0]][%[[T1]] : i64] : vector<1xf32>
	// CHECK: llvm.call @printF32(%[[T2]]) : (f32) -> ()
	// CHECK: llvm.call @printClose() : () -> ()
	// CHECK: llvm.call @printNewline() : () -> ()
	// CHECK: return

	// -----

	func.func @vector_print_vector(%arg0: vector<2x2xf32>) {
	vector.print %arg0 : vector<2x2xf32>
	return
	}
	// CHECK-LABEL: @vector_print_vector(
	// CHECK-SAME: %[[A:.*]]: vector<2x2xf32>)
	// CHECK: %[[VAL_1:.*]] = builtin.unrealized_conversion_cast %[[A]] : vector<2x2xf32> to !llvm.array<2 x vector<2xf32>>
	// CHECK: llvm.call @printOpen() : () -> ()
	// CHECK: %[[x0:.*]] = llvm.extractvalue %[[VAL_1]][0] : !llvm.array<2 x vector<2xf32>>
	// CHECK: llvm.call @printOpen() : () -> ()
	// CHECK: %[[x1:.*]] = llvm.mlir.constant(0 : index) : i64
	// CHECK: %[[x2:.*]] = llvm.extractelement %[[x0]][%[[x1]] : i64] : vector<2xf32>
	// CHECK: llvm.call @printF32(%[[x2]]) : (f32) -> ()
	// CHECK: llvm.call @printComma() : () -> ()
	// CHECK: %[[x3:.*]] = llvm.mlir.constant(1 : index) : i64
	// CHECK: %[[x4:.*]] = llvm.extractelement %[[x0]][%[[x3]] : i64] : vector<2xf32>
	// CHECK: llvm.call @printF32(%[[x4]]) : (f32) -> ()
	// CHECK: llvm.call @printClose() : () -> ()
	// CHECK: llvm.call @printComma() : () -> ()
	// CHECK: %[[x5:.*]] = llvm.extractvalue %[[VAL_1]][1] : !llvm.array<2 x vector<2xf32>>
	// CHECK: llvm.call @printOpen() : () -> ()
	// CHECK: %[[x6:.*]] = llvm.mlir.constant(0 : index) : i64
	// CHECK: %[[x7:.*]] = llvm.extractelement %[[x5]][%[[x6]] : i64] : vector<2xf32>
	// CHECK: llvm.call @printF32(%[[x7]]) : (f32) -> ()
	// CHECK: llvm.call @printComma() : () -> ()
	// CHECK: %[[x8:.*]] = llvm.mlir.constant(1 : index) : i64
	// CHECK: %[[x9:.*]] = llvm.extractelement %[[x5]][%[[x8]] : i64] : vector<2xf32>
	// CHECK: llvm.call @printF32(%[[x9]]) : (f32) -> ()
	// CHECK: llvm.call @printClose() : () -> ()
	// CHECK: llvm.call @printClose() : () -> ()
	// CHECK: llvm.call @printNewline() : () -> ()

	// -----

	func.func @extract_strided_slice1(%arg0: vector<4xf32>) -> vector<2xf32> {			func.func @extract_strided_slice1(%arg0: vector<4xf32>) -> vector<2xf32> {
	%0 = vector.extract_strided_slice %arg0 {offsets = [2], sizes = [2], strides = [1]} : vector<4xf32> to vector<2xf32>			%0 = vector.extract_strided_slice %arg0 {offsets = [2], sizes = [2], strides = [1]} : vector<4xf32> to vector<2xf32>
	return %0 : vector<2xf32>			return %0 : vector<2xf32>
	}			}
	// CHECK-LABEL: @extract_strided_slice1(			// CHECK-LABEL: @extract_strided_slice1(
	// CHECK-SAME: %[[A:.*]]: vector<4xf32>)			// CHECK-SAME: %[[A:.*]]: vector<4xf32>)
	// CHECK: %[[T0:.*]] = llvm.shufflevector %[[A]], %[[A]] [2, 3] : vector<4xf32>			// CHECK: %[[T0:.*]] = llvm.shufflevector %[[A]], %[[A]] [2, 3] : vector<4xf32>
	// CHECK: return %[[T0]] : vector<2xf32>			// CHECK: return %[[T0]] : vector<2xf32>
	▲ Show 20 Lines • Show All 1,170 Lines • Show Last 20 Lines

mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir

	Show First 20 Lines • Show All 540 Lines • ▼ Show 20 Lines
	// CHECK: scf.for %[[IDX:.*]] = %[[C_0]] to %[[UB]] step %[[STEP]] {			// CHECK: scf.for %[[IDX:.*]] = %[[C_0]] to %[[UB]] step %[[STEP]] {
	// CHECK: %[[MASK_VAL:.*]] = vector.extractelement %[[MASK_VEC]][%[[IDX]] : index] : vector<[16]xi1>			// CHECK: %[[MASK_VAL:.*]] = vector.extractelement %[[MASK_VEC]][%[[IDX]] : index] : vector<[16]xi1>
	// CHECK: scf.if %[[MASK_VAL]] {			// CHECK: scf.if %[[MASK_VAL]] {
	// CHECK: %[[VAL_TO_STORE:.]] = vector.extractelement %{{.}}[%[[IDX]] : index] : vector<[16]xf32>			// CHECK: %[[VAL_TO_STORE:.]] = vector.extractelement %{{.}}[%[[IDX]] : index] : vector<[16]xf32>
	// CHECK: memref.store %[[VAL_TO_STORE]], %[[ARG_0]][%[[IDX]]] : memref<?xf32, strided<[?], offset: ?>>			// CHECK: memref.store %[[VAL_TO_STORE]], %[[ARG_0]][%[[IDX]]] : memref<?xf32, strided<[?], offset: ?>>
	// CHECK: } else {			// CHECK: } else {
	// CHECK: }			// CHECK: }
	// CHECK: }			// CHECK: }

				// -----

				func.func @vector_print_vector_0d(%arg0: vector<f32>) {
				vector.print %arg0 : vector<f32>
				return
				}
				// CHECK-LABEL: func.func @vector_print_vector_0d(
				// CHECK-SAME: %[[RANK_0_VEC:.*]]: vector<f32>) {
				// CHECK: %[[C0:.*]] = arith.constant 0 : index
				// CHECK: %[[C1:.*]] = arith.constant 1 : index
				// CHECK: %[[RANK_1_VEC:.*]] = vector.shape_cast %[[RANK_0_VEC]] : vector<f32> to vector<1xf32>
				// CHECK: vector.print <open>
				// CHECK: scf.for %[[IDX:.*]] = %[[C0]] to %[[C1]] step %[[C1]] {
				// CHECK: %[[EL:.*]] = vector.extractelement %[[RANK_1_VEC]]{{\[}}%[[IDX]] : index] : vector<1xf32>
				// CHECK: vector.print %[[EL]] : f32 <no_punctuation>
				// CHECK: %[[IS_NOT_LAST:.*]] = arith.cmpi ult, %[[IDX]], %[[C0]] : index
				// CHECK: scf.if %[[IS_NOT_LAST]] {
				c-rhodesUnsubmitted Done Reply Inline Actions nit: actual names like `IS_NOT_LAST_ITER` would be helpful in the tests c-rhodes: nit: actual names like `IS_NOT_LAST_ITER` would be helpful in the tests
				// CHECK: vector.print <comma>
				// CHECK: }
				// CHECK: }
				// CHECK: vector.print <close>
				// CHECK: vector.print
				// CHECK: return
				// CHECK: }

				// -----

				func.func @vector_print_vector(%arg0: vector<2x2xf32>) {
				vector.print %arg0 : vector<2x2xf32>
				return
				}
				// CHECK-LABEL: func.func @vector_print_vector(
				// CHECK-SAME: %[[VEC_2D:.*]]: vector<2x2xf32>) {
				// CHECK: %[[C0:.*]] = arith.constant 0 : index
				// CHECK: %[[C2:.*]] = arith.constant 2 : index
				// CHECK: %[[C1:.*]] = arith.constant 1 : index
				// CHECK: %[[ALLOCA_VEC:.*]] = memref.alloca() : memref<vector<2x2xf32>>
				// CHECK: %[[ALLOCA:.*]] = memref.alloca() : memref<2x2xf32>
				// CHECK: memref.store %[[VEC_2D]], %[[ALLOCA_VEC]][] : memref<vector<2x2xf32>>
				// CHECK: %[[ALLOCA_TYPECAST:.*]] = vector.type_cast %[[ALLOCA_VEC]] : memref<vector<2x2xf32>> to memref<2xvector<2xf32>>
				// CHECK: scf.for %[[COPY_IDX:.*]] = %[[C0]] to %[[C2]] step %[[C1]] {
				// CHECK: %[[VEC_SLICE:.*]] = memref.load %[[ALLOCA_TYPECAST]]{{\[}}%[[COPY_IDX]]] : memref<2xvector<2xf32>>
				// CHECK: vector.transfer_write %[[VEC_SLICE]], %[[ALLOCA]]{{\[}}%[[COPY_IDX]], %[[C0]]] {in_bounds = [true]} : vector<2xf32>, memref<2x2xf32>
				// CHECK: }
				// CHECK: vector.print <open>
				// CHECK: scf.for %[[I:.*]] = %[[C0]] to %[[C2]] step %[[C1]] {
				// CHECK: vector.print <open>
				// CHECK: scf.for %[[J:.*]] = %[[C0]] to %[[C2]] step %[[C1]] {
				// CHECK: %[[EL:.*]] = memref.load %[[ALLOCA]]{{\[}}%[[I]], %[[J]]] : memref<2x2xf32>
				// CHECK: vector.print %[[EL]] : f32 <no_punctuation>
				// CHECK: %[[IS_NOT_LAST_INNER:.*]] = arith.cmpi ult, %[[J]], %[[C1]] : index
				// CHECK: scf.if %[[IS_NOT_LAST_INNER]] {
				// CHECK: vector.print <comma>
				// CHECK: }
				// CHECK: }
				// CHECK: vector.print <close>
				// CHECK: %[[IS_NOT_LAST_OUTER:.*]] = arith.cmpi ult, %[[I]], %[[C1]] : index
				// CHECK: scf.if %[[IS_NOT_LAST_OUTER]] {
				// CHECK: vector.print <comma>
				// CHECK: }
				// CHECK: }
				// CHECK: vector.print <close>
				// CHECK: vector.print
				// CHECK: return
				// CHECK: }

				// -----

				func.func @vector_print_scalable_vector(%arg0: vector<[4]xi32>) {
				vector.print %arg0 : vector<[4]xi32>
				return
				}
				// CHECK-LABEL: func.func @vector_print_scalable_vector(
				// CHECK-SAME: %[[SCALABLE_VEC:.*]]: vector<[4]xi32>) {
				awarzynskiUnsubmitted Done Reply Inline Actions How about 2d scalable vectors? awarzynski: How about 2d scalable vectors?
				benmxwl-armAuthorUnsubmitted Done Reply Inline Actions I could add a test for the SCF output here (which I believe is valid), though currently >= 2D scalables can't be lowered any further. benmxwl-arm: I could add a test for the SCF output here (which I believe is valid), though currently >= 2D…
				awarzynskiUnsubmitted Done Reply Inline Actions To me that's a good reason to completely disallow 2D scalables. If we are not going to use it, it's effectively dead code 🤔 . This way we are much clearer about what's supported and what isn't. awarzynski: To me that's a good reason to completely disallow 2D scalables. If we are not going to use it…
				benmxwl-armAuthorUnsubmitted Done Reply Inline Actions It does not require extra code to support here (it comes as a part of already supporting n-D vectors + scalable vectors). It's just that currently 2D+ scalable vectors can't be lowered to LLVM, so the lowering actually fails before any specific to `vector.print` (e.g. at the `arith.constant dense<0.0> : vector<[4]x[4]xf32>` in the test case). benmxwl-arm: It does not require extra code to support here (it comes as a part of already supporting n-D…
				awarzynskiUnsubmitted Done Reply Inline Actions This is a good point, but I would still be hesitant to allow `vector.print %arg0 : vector<[4]x[4]xf32>` anywhere. We know that it will never be lowered to anything useful (well, there's nothing on the horizon). In cases like this I try to follow the principle of least surprises :) Having said that, I appreciate that it's a bit weird to impose such limitations at the Vector dialect level. Please go with whichever approach you prefer. awarzynski: This is a good point, but I would still be hesitant to allow `vector.print %arg0 : vector<[4]x…
				// CHECK: %[[C0:.*]] = arith.constant 0 : index
				// CHECK: %[[C4:.*]] = arith.constant 4 : index
				// CHECK: %[[C1:.*]] = arith.constant 1 : index
				// CHECK: %[[VSCALE:.*]] = vector.vscale
				// CHECK: %[[UPPER_BOUND:.*]] = arith.muli %[[VSCALE]], %[[C4]] : index
				// CHECK: %[[LAST_IDX:.*]] = arith.subi %[[UPPER_BOUND]], %[[C1]] : index
				// CHECK: vector.print <open>
				// CHECK: scf.for %[[IDX:.*]] = %[[C0]] to %[[UPPER_BOUND]] step %[[C1]] {
				// CHECK: %[[EL:.*]] = vector.extractelement %[[SCALABLE_VEC]]{{\[}}%[[IDX]] : index] : vector<[4]xi32>
				// CHECK: vector.print %[[EL]] : i32 <no_punctuation>
				// CHECK: %[[IS_NOT_LAST:.*]] = arith.cmpi ult, %[[IDX]], %[[LAST_IDX]] : index
				// CHECK: scf.if %[[IS_NOT_LAST]] {
				// CHECK: vector.print <comma>
				// CHECK: }
				// CHECK: }
				// CHECK: vector.print <close>
				// CHECK: vector.print
				// CHECK: return
				// CHECK: }

				// -----

				func.func @vector_print_2d_scalable_vector(%arg0: vector<[4]x[4]xf32>) {
				vector.print %arg0 : vector<[4]x[4]xf32>
				return
				}
				// CHECK-LABEL: func.func @vector_print_2d_scalable_vector(
				// CHECK-SAME: %[[SCALABLE_VEC_2D:.*]]: vector<[4]x[4]xf32>) {
				// CHECK: %[[C0:.*]] = arith.constant 0 : index
				// CHECK: %[[C4:.*]] = arith.constant 4 : index
				// CHECK: %[[C1:.*]] = arith.constant 1 : index
				// CHECK: %[[ALLOCA_VEC:.*]] = memref.alloca() : memref<vector<[4]x[4]xf32>>
				// CHECK: %[[VSCALE_0:.*]] = vector.vscale
				// CHECK: %[[UPPER_BOUND_0:.*]] = arith.muli %[[VSCALE_0]], %[[C4]] : index
				// CHECK: %[[VSCALE_1:.*]] = vector.vscale
				// CHECK: %[[UPPER_BOUND_1:.*]] = arith.muli %[[VSCALE_1]], %[[C4]] : index
				// CHECK: %[[ALLOCA:.*]] = memref.alloca(%[[UPPER_BOUND_0]], %[[UPPER_BOUND_1]]) : memref<?x?xf32>
				// CHECK: memref.store %[[SCALABLE_VEC_2D]], %[[ALLOCA_VEC]][] : memref<vector<[4]x[4]xf32>>
				// CHECK: %[[VAL_10:.*]] = vector.type_cast %[[ALLOCA_VEC]] : memref<vector<[4]x[4]xf32>> to memref<4xvector<4xf32>>
				// CHECK: scf.for %[[II:.*]] = %[[C0]] to %[[C4]] step %[[C1]] {
				// CHECK: %[[NOT_END:.*]] = arith.cmpi sgt, %[[UPPER_BOUND_0]], %[[II]] : index
				// CHECK: scf.if %[[NOT_END]] {
				// CHECK: %[[VECTOR_SLICE:.*]] = memref.load %[[VAL_10]]{{\[}}%[[II]]] : memref<4xvector<4xf32>>
				// CHECK: vector.transfer_write %[[VECTOR_SLICE]], %[[ALLOCA]]{{\[}}%[[II]], %[[C0]]] : vector<4xf32>, memref<?x?xf32>
				// CHECK: } else {
				// CHECK: }
				// CHECK: }
				// CHECK: %[[VSCALE_2:.*]] = vector.vscale
				// CHECK: %[[UPPER_BOUND_3:.*]] = arith.muli %[[VSCALE_2]], %[[C4]] : index
				// CHECK: %[[LAST_INDEX_0:.*]] = arith.subi %[[UPPER_BOUND_3]], %[[C1]] : index
				// CHECK: vector.print <open>
				// CHECK: scf.for %[[OUTER_IDX:.*]] = %[[C0]] to %[[UPPER_BOUND_3]] step %[[C1]] {
				// CHECK: %[[VSCALE_3:.*]] = vector.vscale
				// CHECK: %[[UPPER_BOUND_4:.*]] = arith.muli %[[VSCALE_3]], %[[C4]] : index
				// CHECK: %[[LAST_INDEX_1:.*]] = arith.subi %[[UPPER_BOUND_4]], %[[C1]] : index
				// CHECK: vector.print <open>
				// CHECK: scf.for %[[INNER_IDX:.*]] = %[[C0]] to %[[UPPER_BOUND_4]] step %[[C1]] {
				// CHECK: %[[EL:.*]] = memref.load %[[ALLOCA]]{{\[}}%[[OUTER_IDX]], %[[INNER_IDX]]] : memref<?x?xf32>
				// CHECK: vector.print %[[EL]] : f32 <no_punctuation>
				// CHECK: %[[IS_NOT_LAST_0:.*]] = arith.cmpi ult, %[[INNER_IDX]], %[[LAST_INDEX_1]] : index
				// CHECK: scf.if %[[IS_NOT_LAST_0]] {
				// CHECK: vector.print <comma>
				// CHECK: }
				// CHECK: }
				// CHECK: vector.print <close>
				// CHECK: %[[IS_NOT_LAST_1:.*]] = arith.cmpi ult, %[[OUTER_IDX]], %[[LAST_INDEX_0]] : index
				// CHECK: scf.if %[[IS_NOT_LAST_1]] {
				// CHECK: vector.print <comma>
				// CHECK: }
				// CHECK: }
				// CHECK: vector.print <close>
				// CHECK: vector.print
				// CHECK: return
				// CHECK: }

mlir/test/Integration/Dialect/Arith/CPU/test-wide-int-emulation-compare-results-i16.mlir

	// Check that the wide integer emulation produces the same result as wide			// Check that the wide integer emulation produces the same result as wide
	// calculations. Emulate i16 ops with i8 ops.			// calculations. Emulate i16 ops with i8 ops.

	// RUN: mlir-opt %s --test-arith-emulate-wide-int="widest-int-supported=8" \			// RUN: mlir-opt %s --test-arith-emulate-wide-int="widest-int-supported=8" \
	// RUN: --convert-scf-to-cf --convert-cf-to-llvm --convert-vector-to-llvm \			// RUN: --convert-vector-to-scf --convert-scf-to-cf --convert-cf-to-llvm \
	// RUN: --convert-func-to-llvm --convert-arith-to-llvm \| \			// RUN: --convert-vector-to-llvm --convert-func-to-llvm --convert-arith-to-llvm \
				// RUN: --reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: --shared-libs="%mlir_c_runner_utils,%mlir_runner_utils" \| \			// RUN: --shared-libs="%mlir_c_runner_utils,%mlir_runner_utils" \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	// CHECK-NOT: Mismatch			// CHECK-NOT: Mismatch

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Common Utility Functions			// Common Utility Functions
	▲ Show 20 Lines • Show All 297 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Arith/CPU/test-wide-int-emulation-constants-i16.mlir

	// Check that the wide integer constant emulation produces the same result as wide			// Check that the wide integer constant emulation produces the same result as wide
	// constants and that printing works. Emulate i16 ops with i8 ops.			// constants and that printing works. Emulate i16 ops with i8 ops.

	// RUN: mlir-opt %s --test-arith-emulate-wide-int="widest-int-supported=8" \			// RUN: mlir-opt %s --test-arith-emulate-wide-int="widest-int-supported=8" \
	// RUN: --convert-scf-to-cf --convert-cf-to-llvm --convert-vector-to-llvm \			// RUN: --convert-vector-to-scf --convert-scf-to-cf --convert-cf-to-llvm --convert-vector-to-llvm \
	// RUN: --convert-func-to-llvm --convert-arith-to-llvm \| \			// RUN: --convert-func-to-llvm --convert-arith-to-llvm --reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: --shared-libs=%mlir_c_runner_utils \| \			// RUN: --shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s --match-full-lines --check-prefix=EMULATED			// RUN: FileCheck %s --match-full-lines --check-prefix=EMULATED

	func.func @entry() {			func.func @entry() {
	%cst0 = arith.constant 0 : i16			%cst0 = arith.constant 0 : i16
	func.call @emulate_constant(%cst0) : (i16) -> ()			func.call @emulate_constant(%cst0) : (i16) -> ()
	func.call @foo(%cst0) : (i16) -> ()			func.call @foo(%cst0) : (i16) -> ()
	▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/LLVMIR/CPU/X86/test-inline-asm-vector.mlir

// RUN: mlir-opt %s -convert-vector-to-llvm \| \		// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \
// RUN: mlir-cpu-runner -e entry_point_with_all_constants -entry-point-result=void \		// RUN: mlir-cpu-runner -e entry_point_with_all_constants -entry-point-result=void \
// RUN: -shared-libs=%mlir_c_runner_utils		// RUN: -shared-libs=%mlir_c_runner_utils

module {		module {
llvm.func @function_to_run(%a: vector<8xf32>, %b: vector<8xf32>) {		func.func @function_to_run(%a: vector<8xf32>, %b: vector<8xf32>) {
		awarzynskiUnsubmitted Not Done Reply Inline Actions Which conversion fails if you keep this as `llvm.func`? awarzynski: Which conversion fails if you keep this as `llvm.func`?
		benmxwl-armAuthorUnsubmitted Done Reply Inline Actions I may be lowering it incorrectly, but it gets stuck with some `index` types and `cf.br`s that fail to lower to LLVM following something like: `-convert-vector-to-scf -convert-scf-to-cf -convert-cf-to-llvm -convert-vector-to-llvm -convert-arith-to-llvm -reconcile-unrealized-casts` benmxwl-arm: I may be lowering it incorrectly, but it gets stuck with some `index` types and `cf.br`s that…
// CHECK: ( 8, 10, 12, 14, 16, 18, 20, 22 )		// CHECK: ( 8, 10, 12, 14, 16, 18, 20, 22 )
%r0 = llvm.inline_asm asm_dialect = intel		%r0 = llvm.inline_asm asm_dialect = intel
"vaddps $0, $1, $2", "=x,x,x" %a, %b:		"vaddps $0, $1, $2", "=x,x,x" %a, %b:
(vector<8xf32>, vector<8xf32>) -> vector<8xf32>		(vector<8xf32>, vector<8xf32>) -> vector<8xf32>
vector.print %r0: vector<8xf32>		vector.print %r0: vector<8xf32>

// vblendps implemented with inline_asm.		// vblendps implemented with inline_asm.
// CHECK: ( 0, 1, 10, 11, 4, 5, 14, 15 )		// CHECK: ( 0, 1, 10, 11, 4, 5, 14, 15 )
Show All 16 Lines	func.func @function_to_run(%a: vector<8xf32>, %b: vector<8xf32>) {
vector.print %r3: vector<8xf32>		vector.print %r3: vector<8xf32>

// vblendps 0x33 via vector.shuffle (emulates clang intrinsics impl)		// vblendps 0x33 via vector.shuffle (emulates clang intrinsics impl)
// CHECK: ( 8, 9, 2, 3, 12, 13, 6, 7 )		// CHECK: ( 8, 9, 2, 3, 12, 13, 6, 7 )
%r4 = vector.shuffle %a, %b[8, 9, 2, 3, 12, 13, 6, 7]		%r4 = vector.shuffle %a, %b[8, 9, 2, 3, 12, 13, 6, 7]
: vector<8xf32>, vector<8xf32>		: vector<8xf32>, vector<8xf32>
vector.print %r4: vector<8xf32>		vector.print %r4: vector<8xf32>

llvm.return		return
}		}

// Solely exists to prevent inlining and get the expected assembly.		// Solely exists to prevent inlining and get the expected assembly.
llvm.func @entry_point(%a: vector<8xf32>, %b: vector<8xf32>) {		func.func @entry_point(%a: vector<8xf32>, %b: vector<8xf32>) {
llvm.call @function_to_run(%a, %b) : (vector<8xf32>, vector<8xf32>) -> ()		func.call @function_to_run(%a, %b) : (vector<8xf32>, vector<8xf32>) -> ()
llvm.return		return
}		}

llvm.func @entry_point_with_all_constants() {		func.func @entry_point_with_all_constants() {
%a = llvm.mlir.constant(dense<[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]>		%a = llvm.mlir.constant(dense<[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]>
: vector<8xf32>) : vector<8xf32>		: vector<8xf32>) : vector<8xf32>
%b = llvm.mlir.constant(dense<[8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0]>		%b = llvm.mlir.constant(dense<[8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0]>
: vector<8xf32>) : vector<8xf32>		: vector<8xf32>) : vector<8xf32>
llvm.call @function_to_run(%a, %b) : (vector<8xf32>, vector<8xf32>) -> ()		func.call @function_to_run(%a, %b) : (vector<8xf32>, vector<8xf32>) -> ()
llvm.return		return
}		}
}		}

mlir/test/Integration/Dialect/LLVMIR/CPU/test-vp-intrinsic.mlir

	// RUN: mlir-opt %s -convert-vector-to-llvm -finalize-memref-to-llvm \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-cf-to-llvm \
	// RUN: -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: -convert-vector-to-llvm -convert-index-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm \
				// RUN: -reconcile-unrealized-casts \| \
	// RUN: mlir-translate -mlir-to-llvmir \| \			// RUN: mlir-translate -mlir-to-llvmir \| \
	// RUN: %lli --entry-function=entry \			// RUN: %lli --entry-function=entry \
	// RUN: --dlopen=%mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext \| \			// RUN: --dlopen=%mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	// %mlir_native_utils_lib_dir is incorrect on Windows			// %mlir_native_utils_lib_dir is incorrect on Windows
	// UNSUPPORTED: system-windows			// UNSUPPORTED: system-windows

	Show All 25 Lines

mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/test-sve.mlir

	// RUN: mlir-opt %s -lower-affine -convert-scf-to-cf -convert-vector-to-llvm="enable-arm-sve" -finalize-memref-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -canonicalize \| \			// RUN: mlir-opt %s -lower-affine -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm="enable-arm-sve" -finalize-memref-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -canonicalize \| \
	// RUN: %mcr_aarch64_cmd -e=entry -entry-point-result=void --march=aarch64 --mattr="+sve" -shared-libs=%mlir_lib_dir/libmlir_c_runner_utils%shlibext \| \			// RUN: %mcr_aarch64_cmd -e=entry -entry-point-result=void --march=aarch64 --mattr="+sve" -shared-libs=%mlir_lib_dir/libmlir_c_runner_utils%shlibext \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	// Note: To run this test, your CPU must support SVE			// Note: To run this test, your CPU must support SVE

	// VLA memcopy			// VLA memcopy
	func.func @kernel_copy(%src : memref<?xi64>, %dst : memref<?xi64>, %size : index) {			func.func @kernel_copy(%src : memref<?xi64>, %dst : memref<?xi64>, %size : index) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	▲ Show 20 Lines • Show All 205 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/X86Vector/test-dot.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm="enable-x86vector" -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm="enable-x86vector" -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-translate --mlir-to-llvmir \| \			// RUN: mlir-translate --mlir-to-llvmir \| \
	// RUN: %lli --entry-function=entry --mattr="avx" --dlopen=%mlir_c_runner_utils \| \			// RUN: %lli --entry-function=entry --mattr="avx" --dlopen=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() -> i32 {			func.func @entry() -> i32 {
	%i0 = arith.constant 0 : i32			%i0 = arith.constant 0 : i32
	%i4 = arith.constant 4 : i32			%i4 = arith.constant 4 : i32

	Show All 15 Lines

mlir/test/Integration/Dialect/Vector/CPU/X86Vector/test-mask-compress.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm="enable-x86vector" -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm="enable-x86vector" -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-translate --mlir-to-llvmir \| \			// RUN: mlir-translate --mlir-to-llvmir \| \
	// RUN: %lli --entry-function=entry --mattr="avx512bw" --dlopen=%mlir_c_runner_utils \| \			// RUN: %lli --entry-function=entry --mattr="avx512bw" --dlopen=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() -> i32 {			func.func @entry() -> i32 {
	%i0 = arith.constant 0 : i32			%i0 = arith.constant 0 : i32

	%a = arith.constant dense<[1., 0., 0., 2., 4., 3., 5., 7., 8., 1., 5., 5., 3., 1., 0., 7.]> : vector<16xf32>			%a = arith.constant dense<[1., 0., 0., 2., 4., 3., 5., 7., 8., 1., 5., 5., 3., 1., 0., 7.]> : vector<16xf32>
	Show All 18 Lines

mlir/test/Integration/Dialect/Vector/CPU/X86Vector/test-rsqrt.mlir

	// RUN: mlir-opt %s -convert-vector-to-llvm="enable-x86vector" -convert-func-to-llvm \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm="enable-x86vector" -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-translate --mlir-to-llvmir \| \			// RUN: mlir-translate --mlir-to-llvmir \| \
	// RUN: %lli --entry-function=entry --mattr="avx" --dlopen=%mlir_c_runner_utils \| \			// RUN: %lli --entry-function=entry --mattr="avx" --dlopen=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() -> i32 {			func.func @entry() -> i32 {
	%i0 = arith.constant 0 : i32			%i0 = arith.constant 0 : i32

	%v = arith.constant dense<[0.125, 0.25, 0.5, 1.0, 2.0, 4.0, 8.0, 16.0]> : vector<8xf32>			%v = arith.constant dense<[0.125, 0.25, 0.5, 1.0, 2.0, 4.0, 8.0, 16.0]> : vector<8xf32>
	%r = x86vector.avx.rsqrt %v : vector<8xf32>			%r = x86vector.avx.rsqrt %v : vector<8xf32>
	// `rsqrt` may produce slightly different results on Intel and AMD machines: accept both results here.			// `rsqrt` may produce slightly different results on Intel and AMD machines: accept both results here.
	// CHECK: {{( 2.82[0-9], 1.99[0-9], 1.41[0-9], 0.99[0-9], 0.70[0-9], 0.49[0-9], 0.35[0-9], 0.24[0-9] )}}			// CHECK: {{( 2.82[0-9], 1.99[0-9], 1.41[0-9], 0.99[0-9], 0.70[0-9], 0.49[0-9], 0.35[0-9], 0.24[0-9] )}}
	vector.print %r : vector<8xf32>			vector.print %r : vector<8xf32>

	return %i0 : i32			return %i0 : i32
	}			}

mlir/test/Integration/Dialect/Vector/CPU/X86Vector/test-vp2intersect-i32.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm="enable-x86vector" -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm="enable-x86vector" -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-translate --mlir-to-llvmir \| \			// RUN: mlir-translate --mlir-to-llvmir \| \
	// RUN: %lli --entry-function=entry --mattr="avx512bw,avx512vp2intersect" --dlopen=%mlir_c_runner_utils \| \			// RUN: %lli --entry-function=entry --mattr="avx512bw,avx512vp2intersect" --dlopen=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	// Note: To run this test, your CPU must support AVX512 vp2intersect.			// Note: To run this test, your CPU must support AVX512 vp2intersect.

	func.func @entry() -> i32 {			func.func @entry() -> i32 {
	%i0 = arith.constant 0 : i32			%i0 = arith.constant 0 : i32
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-0-d-vectors.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -test-lower-to-llvm \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @extract_element_0d(%a: vector<f32>) {			func.func @extract_element_0d(%a: vector<f32>) {
	%1 = vector.extractelement %a[] : vector<f32>			%1 = vector.extractelement %a[] : vector<f32>
	// CHECK: 42			// CHECK: 42
	vector.print %1: f32			vector.print %1: f32
	▲ Show 20 Lines • Show All 165 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-broadcast.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -test-lower-to-llvm \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	%i = arith.constant 2147483647: i32			%i = arith.constant 2147483647: i32
	%l = arith.constant 9223372036854775807 : i64			%l = arith.constant 9223372036854775807 : i64

	▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-compress.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -test-lower-to-llvm \| \
				awarzynskiUnsubmitted Done Reply Inline Actions This is a bit unrelated. We should either switch all tests to `-test-lower-to-llvm` or none. IMO, it would be safer to keep the original pipeline. Switching to `-test-lower-to-llvm` could be proposed separately. awarzynski: This is a bit unrelated. We should either switch all tests to `-test-lower-to-llvm ` or none.
				benmxwl-armAuthorUnsubmitted Done Reply Inline Actions It's mainly because these tests use memrefs now, and hand-rolling the pipeline is quite unwieldy (and the original can't be kept anyway). benmxwl-arm: It's mainly because these tests use memrefs now, and hand-rolling the pipeline is quite…
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @compress16(%base: memref<?xf32>,			func.func @compress16(%base: memref<?xf32>,
	%mask: vector<16xi1>, %value: vector<16xf32>) {			%mask: vector<16xi1>, %value: vector<16xf32>) {
	%c0 = arith.constant 0: index			%c0 = arith.constant 0: index
	vector.compressstore %base[%c0], %mask, %value			vector.compressstore %base[%c0], %mask, %value
	▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-constant-mask.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -test-lower-to-llvm \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	%0 = vector.constant_mask [4] : vector<8xi1>			%0 = vector.constant_mask [4] : vector<8xi1>
	vector.print %0 : vector<8xi1>			vector.print %0 : vector<8xi1>
	// CHECK: ( 1, 1, 1, 1, 0, 0, 0, 0 )			// CHECK: ( 1, 1, 1, 1, 0, 0, 0, 0 )
	Show All 40 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-contraction.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -test-lower-to-llvm \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	#dotp_accesses = [			#dotp_accesses = [
	affine_map<(i) -> (i)>,			affine_map<(i) -> (i)>,
	affine_map<(i) -> (i)>,			affine_map<(i) -> (i)>,
	affine_map<(i) -> ()>			affine_map<(i) -> ()>
	▲ Show 20 Lines • Show All 374 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-create-mask-v4i1.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -test-lower-to-llvm \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	// NOTE: This is similar to test-create-mask.mlir, but with a different length,			// NOTE: This is similar to test-create-mask.mlir, but with a different length,
	// because the v4i1 vector specifically exposed bugs in the LLVM backend.			// because the v4i1 vector specifically exposed bugs in the LLVM backend.

	func.func @entry() {			func.func @entry() {
	▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-create-mask.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -test-lower-to-llvm \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	%cneg1 = arith.constant -1 : index			%cneg1 = arith.constant -1 : index
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%c1 = arith.constant 1 : index			%c1 = arith.constant 1 : index
	▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-expand.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @expand16(%base: memref<?xf32>,			func.func @expand16(%base: memref<?xf32>,
	%mask: vector<16xi1>,			%mask: vector<16xi1>,
	%pass_thru: vector<16xf32>) -> vector<16xf32> {			%pass_thru: vector<16xf32>) -> vector<16xf32> {
	%c0 = arith.constant 0: index			%c0 = arith.constant 0: index
	▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-extract-strided-slice.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -test-lower-to-llvm \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	%f0 = arith.constant 0.0: f32			%f0 = arith.constant 0.0: f32
	%f1 = arith.constant 1.0: f32			%f1 = arith.constant 1.0: f32
	%f2 = arith.constant 2.0: f32			%f2 = arith.constant 2.0: f32
	Show All 23 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-flat-transpose-col.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -O0 -enable-matrix -matrix-allow-contract -matrix-default-layout=column-major \			// RUN: -O0 -enable-matrix -matrix-allow-contract -matrix-default-layout=column-major \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	%f0 = arith.constant 0.0: f64			%f0 = arith.constant 0.0: f64
	%f1 = arith.constant 1.0: f64			%f1 = arith.constant 1.0: f64
	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-flat-transpose-row.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -O0 -enable-matrix -matrix-allow-contract -matrix-default-layout=row-major \			// RUN: -O0 -enable-matrix -matrix-allow-contract -matrix-default-layout=row-major \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	%f0 = arith.constant 0.0: f64			%f0 = arith.constant 0.0: f64
	%f1 = arith.constant 1.0: f64			%f1 = arith.constant 1.0: f64
	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-fma.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	%f1 = arith.constant 1.0: f32			%f1 = arith.constant 1.0: f32
	%f3 = arith.constant 3.0: f32			%f3 = arith.constant 3.0: f32
	%f7 = arith.constant 7.0: f32			%f7 = arith.constant 7.0: f32
	Show All 19 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-gather.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @gather8(%base: memref<?xf32>, %indices: vector<8xi32>,			func.func @gather8(%base: memref<?xf32>, %indices: vector<8xi32>,
	%mask: vector<8xi1>, %pass_thru: vector<8xf32>) -> vector<8xf32> {			%mask: vector<8xi1>, %pass_thru: vector<8xf32>) -> vector<8xf32> {
	%c0 = arith.constant 0: index			%c0 = arith.constant 0: index
	%g = vector.gather %base[%c0][%indices], %mask, %pass_thru			%g = vector.gather %base[%c0][%indices], %mask, %pass_thru
	▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-index-vectors.mlir

	// RUN: mlir-opt %s -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -test-lower-to-llvm \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	%c0 = arith.constant dense<[0, 1, 2, 3]>: vector<4xindex>			%c0 = arith.constant dense<[0, 1, 2, 3]>: vector<4xindex>
	%c1 = arith.constant dense<[0, 1]>: vector<2xindex>			%c1 = arith.constant dense<[0, 1]>: vector<2xindex>
	%c2 = arith.constant 2 : index			%c2 = arith.constant 2 : index
	Show All 23 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-insert-strided-slice.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -test-lower-to-llvm \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	%f1 = arith.constant 1.0: f32			%f1 = arith.constant 1.0: f32
	%f2 = arith.constant 2.0: f32			%f2 = arith.constant 2.0: f32
	%f3 = arith.constant 3.0: f32			%f3 = arith.constant 3.0: f32
	Show All 27 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-maskedload.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @maskedload16(%base: memref<?xf32>, %mask: vector<16xi1>,			func.func @maskedload16(%base: memref<?xf32>, %mask: vector<16xi1>,
	%pass_thru: vector<16xf32>) -> vector<16xf32> {			%pass_thru: vector<16xf32>) -> vector<16xf32> {
	%c0 = arith.constant 0: index			%c0 = arith.constant 0: index
	%ld = vector.maskedload %base[%c0], %mask, %pass_thru			%ld = vector.maskedload %base[%c0], %mask, %pass_thru
	▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-maskedstore.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @maskedstore16(%base: memref<?xf32>,			func.func @maskedstore16(%base: memref<?xf32>,
	%mask: vector<16xi1>, %value: vector<16xf32>) {			%mask: vector<16xi1>, %value: vector<16xf32>) {
	%c0 = arith.constant 0: index			%c0 = arith.constant 0: index
	vector.maskedstore %base[%c0], %mask, %value			vector.maskedstore %base[%c0], %mask, %value
	▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-matrix-multiply-col.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -O0 -enable-matrix -matrix-allow-contract -matrix-default-layout=column-major \			// RUN: -O0 -enable-matrix -matrix-allow-contract -matrix-default-layout=column-major \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	%f0 = arith.constant 0.0: f64			%f0 = arith.constant 0.0: f64
	%f1 = arith.constant 1.0: f64			%f1 = arith.constant 1.0: f64
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-matrix-multiply-row.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -O0 -enable-matrix -matrix-allow-contract -matrix-default-layout=row-major \			// RUN: -O0 -enable-matrix -matrix-allow-contract -matrix-default-layout=row-major \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	%f0 = arith.constant 0.0: f64			%f0 = arith.constant 0.0: f64
	%f1 = arith.constant 1.0: f64			%f1 = arith.constant 1.0: f64
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-outerproduct-f32.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -test-lower-to-llvm \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	!vector_type_A = vector<8xf32>			!vector_type_A = vector<8xf32>
	!vector_type_B = vector<8xf32>			!vector_type_B = vector<8xf32>
	!vector_type_C = vector<8x8xf32>			!vector_type_C = vector<8x8xf32>

	▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-outerproduct-i64.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -test-lower-to-llvm \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	!vector_type_A = vector<8xi64>			!vector_type_A = vector<8xi64>
	!vector_type_B = vector<8xi64>			!vector_type_B = vector<8xi64>
	!vector_type_C = vector<8x8xi64>			!vector_type_C = vector<8x8xi64>

	▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-print-fp.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	//			//
	// Test various floating-point types.			// Test various floating-point types.
	//			//
	func.func @entry() {			func.func @entry() {
	Show All 18 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-print-int.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	//			//
	// Test various signless, signed, unsigned integer types.			// Test various signless, signed, unsigned integer types.
	//			//
	func.func @entry() {			func.func @entry() {
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-realloc.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \|\			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \|\
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils			// RUN: -shared-libs=%mlir_c_runner_utils
	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -finalize-memref-to-llvm='use-aligned-alloc=1' -convert-func-to-llvm -arith-expand -reconcile-unrealized-casts \|\			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm -finalize-memref-to-llvm='use-aligned-alloc=1' -convert-func-to-llvm -arith-expand -reconcile-unrealized-casts \|\
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| FileCheck %s			// RUN: -shared-libs=%mlir_c_runner_utils \| FileCheck %s

	// FIXME: Windows does not have aligned_alloc			// FIXME: Windows does not have aligned_alloc
	// UNSUPPORTED: system-windows			// UNSUPPORTED: system-windows

	func.func @entry() {			func.func @entry() {
	// Set up memory.			// Set up memory.
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-reductions-f32-reassoc.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf \
	// RUN: -convert-vector-to-llvm='reassociate-fp-reductions' \			// RUN: -convert-vector-to-llvm='reassociate-fp-reductions' \
	// RUN: -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	// Construct test vector, numerically very stable.			// Construct test vector, numerically very stable.
	Show All 29 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-reductions-f32.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	// Construct test vector.			// Construct test vector.
	%f1 = arith.constant 1.5: f32			%f1 = arith.constant 1.5: f32
	%f2 = arith.constant 2.0: f32			%f2 = arith.constant 2.0: f32
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-reductions-f64-reassoc.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf \
	// RUN: -convert-vector-to-llvm='reassociate-fp-reductions' \			// RUN: -convert-vector-to-llvm='reassociate-fp-reductions' \
	// RUN: -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	// Construct test vector, numerically very stable.			// Construct test vector, numerically very stable.
	Show All 29 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-reductions-f64.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	// Construct test vector.			// Construct test vector.
	%f1 = arith.constant 1.5: f64			%f1 = arith.constant 1.5: f64
	%f2 = arith.constant 2.0: f64			%f2 = arith.constant 2.0: f64
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-reductions-i32.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	// Construct test vector.			// Construct test vector.
	%i1 = arith.constant 1: i32			%i1 = arith.constant 1: i32
	%i2 = arith.constant 2: i32			%i2 = arith.constant 2: i32
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-reductions-i4.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	%v = arith.constant dense<[-8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]> : vector<24xi4>			%v = arith.constant dense<[-8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]> : vector<24xi4>
	vector.print %v : vector<24xi4>			vector.print %v : vector<24xi4>
	//			//
	Show All 35 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-reductions-i64.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	// Construct test vector.			// Construct test vector.
	%i1 = arith.constant 1: i64			%i1 = arith.constant 1: i64
	%i2 = arith.constant 2: i64			%i2 = arith.constant 2: i64
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-reductions-si4.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	%v0 = arith.constant dense<[-8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7]> : vector<16xi4>			%v0 = arith.constant dense<[-8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7]> : vector<16xi4>
	%v = vector.bitcast %v0 : vector<16xi4> to vector<16xsi4>			%v = vector.bitcast %v0 : vector<16xi4> to vector<16xsi4>
	vector.print %v : vector<16xsi4>			vector.print %v : vector<16xsi4>
	Show All 35 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-reductions-ui4.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	%v0 = arith.constant dense<[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]> : vector<16xi4>			%v0 = arith.constant dense<[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]> : vector<16xi4>
	%v = vector.bitcast %v0 : vector<16xi4> to vector<16xui4>			%v = vector.bitcast %v0 : vector<16xi4> to vector<16xui4>
	vector.print %v : vector<16xui4>			vector.print %v : vector<16xui4>
	Show All 35 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-scan.mlir

	// RUN: mlir-opt %s -test-vector-scan-lowering -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -test-vector-scan-lowering -test-lower-to-llvm\| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	%f1 = arith.constant 1.0: f32			%f1 = arith.constant 1.0: f32
	%f2 = arith.constant 2.0: f32			%f2 = arith.constant 2.0: f32
	%f3 = arith.constant 3.0: f32			%f3 = arith.constant 3.0: f32
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-scatter.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @scatter8(%base: memref<?xf32>,			func.func @scatter8(%base: memref<?xf32>,
	%indices: vector<8xi32>,			%indices: vector<8xi32>,
	%mask: vector<8xi1>, %value: vector<8xf32>) {			%mask: vector<8xi1>, %value: vector<8xf32>) {
	%c0 = arith.constant 0: index			%c0 = arith.constant 0: index
	▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-shape-cast.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -test-lower-to-llvm \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	%f1 = arith.constant 1.0: f32			%f1 = arith.constant 1.0: f32
	%f2 = arith.constant 2.0: f32			%f2 = arith.constant 2.0: f32
	%f3 = arith.constant 3.0: f32			%f3 = arith.constant 3.0: f32
	Show All 35 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-shuffle.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -test-lower-to-llvm \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	%f1 = arith.constant 1.0: f32			%f1 = arith.constant 1.0: f32
	%f2 = arith.constant 2.0: f32			%f2 = arith.constant 2.0: f32
	%v1 = vector.broadcast %f1 : f32 to vector<2x4xf32>			%v1 = vector.broadcast %f1 : f32 to vector<2x4xf32>
	Show All 15 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-shuffle16x16.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf \			// RUN: mlir-opt %s \
	// RUN: -test-transform-dialect-interpreter \			// RUN: -test-transform-dialect-interpreter \
	// RUN: -test-transform-dialect-erase-schedule \			// RUN: -test-transform-dialect-erase-schedule -test-lower-to-llvm \| \
	// RUN: -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	%in = arith.constant dense<[[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0], [16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0], [32.0, 33.0, 34.0, 35.0, 36.0, 37.0, 38.0, 39.0, 40.0, 41.0, 42.0, 43.0, 44.0, 45.0, 46.0, 47.0], [48.0, 49.0, 50.0, 51.0, 52.0, 53.0, 54.0, 55.0, 56.0, 57.0, 58.0, 59.0, 60.0, 61.0, 62.0, 63.0], [64.0, 65.0, 66.0, 67.0, 68.0, 69.0, 70.0, 71.0, 72.0, 73.0, 74.0, 75.0, 76.0, 77.0, 78.0, 79.0], [80.0, 81.0, 82.0, 83.0, 84.0, 85.0, 86.0, 87.0, 88.0, 89.0, 90.0, 91.0, 92.0, 93.0, 94.0, 95.0], [96.0, 97.0, 98.0, 99.0, 100.0, 101.0, 102.0, 103.0, 104.0, 105.0, 106.0, 107.0, 108.0, 109.0, 110.0, 111.0], [112.0, 113.0, 114.0, 115.0, 116.0, 117.0, 118.0, 119.0, 120.0, 121.0, 122.0, 123.0, 124.0, 125.0, 126.0, 127.0], [128.0, 129.0, 130.0, 131.0, 132.0, 133.0, 134.0, 135.0, 136.0, 137.0, 138.0, 139.0, 140.0, 141.0, 142.0, 143.0], [144.0, 145.0, 146.0, 147.0, 148.0, 149.0, 150.0, 151.0, 152.0, 153.0, 154.0, 155.0, 156.0, 157.0, 158.0, 159.0], [160.0, 161.0, 162.0, 163.0, 164.0, 165.0, 166.0, 167.0, 168.0, 169.0, 170.0, 171.0, 172.0, 173.0, 174.0, 175.0], [176.0, 177.0, 178.0, 179.0, 180.0, 181.0, 182.0, 183.0, 184.0, 185.0, 186.0, 187.0, 188.0, 189.0, 190.0, 191.0], [192.0, 193.0, 194.0, 195.0, 196.0, 197.0, 198.0, 199.0, 200.0, 201.0, 202.0, 203.0, 204.0, 205.0, 206.0, 207.0], [208.0, 209.0, 210.0, 211.0, 212.0, 213.0, 214.0, 215.0, 216.0, 217.0, 218.0, 219.0, 220.0, 221.0, 222.0, 223.0], [224.0, 225.0, 226.0, 227.0, 228.0, 229.0, 230.0, 231.0, 232.0, 233.0, 234.0, 235.0, 236.0, 237.0, 238.0, 239.0], [240.0, 241.0, 242.0, 243.0, 244.0, 245.0, 246.0, 247.0, 248.0, 249.0, 250.0, 251.0, 252.0, 253.0, 254.0, 255.0]]> : vector<16x16xf32>			%in = arith.constant dense<[[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0], [16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0], [32.0, 33.0, 34.0, 35.0, 36.0, 37.0, 38.0, 39.0, 40.0, 41.0, 42.0, 43.0, 44.0, 45.0, 46.0, 47.0], [48.0, 49.0, 50.0, 51.0, 52.0, 53.0, 54.0, 55.0, 56.0, 57.0, 58.0, 59.0, 60.0, 61.0, 62.0, 63.0], [64.0, 65.0, 66.0, 67.0, 68.0, 69.0, 70.0, 71.0, 72.0, 73.0, 74.0, 75.0, 76.0, 77.0, 78.0, 79.0], [80.0, 81.0, 82.0, 83.0, 84.0, 85.0, 86.0, 87.0, 88.0, 89.0, 90.0, 91.0, 92.0, 93.0, 94.0, 95.0], [96.0, 97.0, 98.0, 99.0, 100.0, 101.0, 102.0, 103.0, 104.0, 105.0, 106.0, 107.0, 108.0, 109.0, 110.0, 111.0], [112.0, 113.0, 114.0, 115.0, 116.0, 117.0, 118.0, 119.0, 120.0, 121.0, 122.0, 123.0, 124.0, 125.0, 126.0, 127.0], [128.0, 129.0, 130.0, 131.0, 132.0, 133.0, 134.0, 135.0, 136.0, 137.0, 138.0, 139.0, 140.0, 141.0, 142.0, 143.0], [144.0, 145.0, 146.0, 147.0, 148.0, 149.0, 150.0, 151.0, 152.0, 153.0, 154.0, 155.0, 156.0, 157.0, 158.0, 159.0], [160.0, 161.0, 162.0, 163.0, 164.0, 165.0, 166.0, 167.0, 168.0, 169.0, 170.0, 171.0, 172.0, 173.0, 174.0, 175.0], [176.0, 177.0, 178.0, 179.0, 180.0, 181.0, 182.0, 183.0, 184.0, 185.0, 186.0, 187.0, 188.0, 189.0, 190.0, 191.0], [192.0, 193.0, 194.0, 195.0, 196.0, 197.0, 198.0, 199.0, 200.0, 201.0, 202.0, 203.0, 204.0, 205.0, 206.0, 207.0], [208.0, 209.0, 210.0, 211.0, 212.0, 213.0, 214.0, 215.0, 216.0, 217.0, 218.0, 219.0, 220.0, 221.0, 222.0, 223.0], [224.0, 225.0, 226.0, 227.0, 228.0, 229.0, 230.0, 231.0, 232.0, 233.0, 234.0, 235.0, 236.0, 237.0, 238.0, 239.0], [240.0, 241.0, 242.0, 243.0, 244.0, 245.0, 246.0, 247.0, 248.0, 249.0, 250.0, 251.0, 252.0, 253.0, 254.0, 255.0]]> : vector<16x16xf32>
	%0 = vector.transpose %in, [1, 0] : vector<16x16xf32> to vector<16x16xf32>			%0 = vector.transpose %in, [1, 0] : vector<16x16xf32> to vector<16x16xf32>
	vector.print %0 : vector<16x16xf32>			vector.print %0 : vector<16x16xf32>
	Show All 26 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-sparse-dot-matvec.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	// Illustrates an 8x8 Sparse Matrix x Vector implemented with only operations			// Illustrates an 8x8 Sparse Matrix x Vector implemented with only operations
	// of the vector dialect (and some std/scf). Essentially, this example performs			// of the vector dialect (and some std/scf). Essentially, this example performs
	// the following multiplication:			// the following multiplication:
	//			//
	▲ Show 20 Lines • Show All 262 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-sparse-saxpy-jagged-matvec.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	// Illustrates an 8x8 Sparse Matrix x Vector implemented with only operations			// Illustrates an 8x8 Sparse Matrix x Vector implemented with only operations
	// of the vector dialect (and some std/scf). Essentially, this example performs			// of the vector dialect (and some std/scf). Essentially, this example performs
	// the following multiplication:			// the following multiplication:
	//			//
	▲ Show 20 Lines • Show All 226 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-transpose.mlir

	// RUN: mlir-opt %s -convert-scf-to-cf -convert-vector-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: mlir-opt %s -test-lower-to-llvm \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() {			func.func @entry() {
	%f0 = arith.constant 0.0: f32			%f0 = arith.constant 0.0: f32
	%f1 = arith.constant 1.0: f32			%f1 = arith.constant 1.0: f32
	%f2 = arith.constant 2.0: f32			%f2 = arith.constant 2.0: f32
	▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/GPU/CUDA/test-reduction-distribute.mlir

	// RUN: mlir-opt %s -test-vector-warp-distribute="hoist-uniform distribute-transfer-write propagate-distribution" -canonicalize \|\			// RUN: mlir-opt %s -test-vector-warp-distribute="hoist-uniform distribute-transfer-write propagate-distribution" -canonicalize \|\
	// RUN: mlir-opt -test-vector-warp-distribute=rewrite-warp-ops-to-scf-if \|\			// RUN: mlir-opt -test-vector-warp-distribute=rewrite-warp-ops-to-scf-if \|\
	// RUN: mlir-opt -lower-affine -convert-scf-to-cf -convert-vector-to-llvm \			// RUN: mlir-opt -lower-affine -convert-vector-to-scf -convert-scf-to-cf -convert-vector-to-llvm \
	// RUN: -convert-arith-to-llvm -gpu-kernel-outlining \|\			// RUN: -convert-arith-to-llvm -gpu-kernel-outlining \|\
	// RUN: mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm,reconcile-unrealized-casts,gpu-to-cubin))' \|\			// RUN: mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm,reconcile-unrealized-casts,gpu-to-cubin))' \|\
	// RUN: mlir-opt -gpu-to-llvm -reconcile-unrealized-casts \|\			// RUN: mlir-opt -gpu-to-llvm -reconcile-unrealized-casts \|\
	// RUN: mlir-cpu-runner -e main -entry-point-result=void \			// RUN: mlir-cpu-runner -e main -entry-point-result=void \
	// RUN: -shared-libs=%mlir_cuda_runtime \			// RUN: -shared-libs=%mlir_cuda_runtime \
	// RUN: -shared-libs=%mlir_c_runner_utils \			// RUN: -shared-libs=%mlir_c_runner_utils \
	// RUN: -shared-libs=%mlir_runner_utils \| \			// RUN: -shared-libs=%mlir_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s
	▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/GPU/CUDA/test-warp-distribute.mlir

	// Run the test cases without distributing ops to test default lowering. Run			// Run the test cases without distributing ops to test default lowering. Run
	// everything on the same thread.			// everything on the same thread.
	// RUN: mlir-opt %s -test-vector-warp-distribute=rewrite-warp-ops-to-scf-if -canonicalize \| \			// RUN: mlir-opt %s -test-vector-warp-distribute=rewrite-warp-ops-to-scf-if -canonicalize \| \
	// RUN: mlir-opt -convert-scf-to-cf -convert-cf-to-llvm -convert-vector-to-llvm -convert-arith-to-llvm \			// RUN: mlir-opt -convert-vector-to-scf -convert-scf-to-cf -convert-cf-to-llvm -convert-vector-to-llvm -convert-arith-to-llvm \
	// RUN: -gpu-kernel-outlining \|\			// RUN: -gpu-kernel-outlining \|\
	// RUN: mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm,reconcile-unrealized-casts,gpu-to-cubin))' \|\			// RUN: mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm,reconcile-unrealized-casts,gpu-to-cubin))' \|\
	// RUN: mlir-opt -gpu-to-llvm -reconcile-unrealized-casts \|\			// RUN: mlir-opt -gpu-to-llvm -reconcile-unrealized-casts \|\
	// RUN: mlir-cpu-runner -e main -entry-point-result=void \			// RUN: mlir-cpu-runner -e main -entry-point-result=void \
	// RUN: -shared-libs=%mlir_cuda_runtime \			// RUN: -shared-libs=%mlir_cuda_runtime \
	// RUN: -shared-libs=%mlir_c_runner_utils \			// RUN: -shared-libs=%mlir_c_runner_utils \
	// RUN: -shared-libs=%mlir_runner_utils \| \			// RUN: -shared-libs=%mlir_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	// Run the same test cases with distribution and propagation.			// Run the same test cases with distribution and propagation.
	// RUN: mlir-opt %s -test-vector-warp-distribute="hoist-uniform distribute-transfer-write" \			// RUN: mlir-opt %s -test-vector-warp-distribute="hoist-uniform distribute-transfer-write" \
	// RUN: -test-vector-warp-distribute=rewrite-warp-ops-to-scf-if -canonicalize \| \			// RUN: -test-vector-warp-distribute=rewrite-warp-ops-to-scf-if -canonicalize \| \
	// RUN: mlir-opt -convert-scf-to-cf -convert-cf-to-llvm -convert-vector-to-llvm -convert-arith-to-llvm \			// RUN: mlir-opt -convert-vector-to-scf -convert-scf-to-cf -convert-cf-to-llvm -convert-vector-to-llvm -convert-arith-to-llvm \
	// RUN: -gpu-kernel-outlining \|\			// RUN: -gpu-kernel-outlining \|\
	// RUN: mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm,reconcile-unrealized-casts,gpu-to-cubin))' \|\			// RUN: mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm,reconcile-unrealized-casts,gpu-to-cubin))' \|\
	// RUN: mlir-opt -gpu-to-llvm -reconcile-unrealized-casts \|\			// RUN: mlir-opt -gpu-to-llvm -reconcile-unrealized-casts \|\
	// RUN: mlir-cpu-runner -e main -entry-point-result=void \			// RUN: mlir-cpu-runner -e main -entry-point-result=void \
	// RUN: -shared-libs=%mlir_cuda_runtime \			// RUN: -shared-libs=%mlir_cuda_runtime \
	// RUN: -shared-libs=%mlir_c_runner_utils \			// RUN: -shared-libs=%mlir_c_runner_utils \
	// RUN: -shared-libs=%mlir_runner_utils \| \			// RUN: -shared-libs=%mlir_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	// RUN: mlir-opt %s -test-vector-warp-distribute="hoist-uniform distribute-transfer-write propagate-distribution" \			// RUN: mlir-opt %s -test-vector-warp-distribute="hoist-uniform distribute-transfer-write propagate-distribution" \
	// RUN: -test-vector-warp-distribute=rewrite-warp-ops-to-scf-if -canonicalize \| \			// RUN: -test-vector-warp-distribute=rewrite-warp-ops-to-scf-if -canonicalize \| \
	// RUN: mlir-opt -convert-scf-to-cf -convert-cf-to-llvm -convert-vector-to-llvm -convert-arith-to-llvm \			// RUN: mlir-opt -convert-vector-to-scf -convert-scf-to-cf -convert-cf-to-llvm -convert-vector-to-llvm -convert-arith-to-llvm \
	// RUN: -gpu-kernel-outlining \|\			// RUN: -gpu-kernel-outlining \|\
	// RUN: mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm,reconcile-unrealized-casts,gpu-to-cubin))' \|\			// RUN: mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm,reconcile-unrealized-casts,gpu-to-cubin))' \|\
	// RUN: mlir-opt -gpu-to-llvm -reconcile-unrealized-casts \|\			// RUN: mlir-opt -gpu-to-llvm -reconcile-unrealized-casts \|\
	// RUN: mlir-cpu-runner -e main -entry-point-result=void \			// RUN: mlir-cpu-runner -e main -entry-point-result=void \
	// RUN: -shared-libs=%mlir_cuda_runtime \			// RUN: -shared-libs=%mlir_cuda_runtime \
	// RUN: -shared-libs=%mlir_c_runner_utils \			// RUN: -shared-libs=%mlir_c_runner_utils \
	// RUN: -shared-libs=%mlir_runner_utils \| \			// RUN: -shared-libs=%mlir_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

mlir/test/mlir-cpu-runner/math-polynomial-approx.mlir

	// RUN: mlir-opt %s -pass-pipeline="builtin.module(func.func(test-math-polynomial-approximation,convert-arith-to-llvm),convert-vector-to-llvm,func.func(convert-math-to-llvm),convert-func-to-llvm,reconcile-unrealized-casts)" \			// RUN: mlir-opt %s -pass-pipeline="builtin.module(func.func(test-math-polynomial-approximation,convert-arith-to-llvm),convert-vector-to-scf,convert-scf-to-cf,convert-cf-to-llvm,convert-vector-to-llvm,func.func(convert-math-to-llvm),convert-func-to-llvm,reconcile-unrealized-casts)" \
	// RUN: \| mlir-cpu-runner \			// RUN: \| mlir-cpu-runner \
	// RUN: -e main -entry-point-result=void -O0 \			// RUN: -e main -entry-point-result=void -O0 \
	// RUN: -shared-libs=%mlir_c_runner_utils \			// RUN: -shared-libs=%mlir_c_runner_utils \
	// RUN: -shared-libs=%mlir_runner_utils \			// RUN: -shared-libs=%mlir_runner_utils \
	// RUN: \| FileCheck %s			// RUN: \| FileCheck %s

	// -------------------------------------------------------------------------- //			// -------------------------------------------------------------------------- //
	// Tanh.			// Tanh.
	▲ Show 20 Lines • Show All 694 Lines • Show Last 20 Lines

mlir/test/mlir-cpu-runner/test-expand-math-approx.mlir

	// RUN: mlir-opt %s -pass-pipeline="builtin.module(func.func(test-expand-math,convert-arith-to-llvm),convert-vector-to-llvm,func.func(convert-math-to-llvm),convert-func-to-llvm,reconcile-unrealized-casts)" \			// RUN: mlir-opt %s -pass-pipeline="builtin.module(func.func(test-expand-math,convert-arith-to-llvm),convert-vector-to-scf,convert-scf-to-cf,convert-cf-to-llvm,convert-vector-to-llvm,func.func(convert-math-to-llvm),convert-func-to-llvm,reconcile-unrealized-casts)" \
	// RUN: \| mlir-cpu-runner \			// RUN: \| mlir-cpu-runner \
	// RUN: -e main -entry-point-result=void -O0 \			// RUN: -e main -entry-point-result=void -O0 \
	// RUN: -shared-libs=%mlir_c_runner_utils \			// RUN: -shared-libs=%mlir_c_runner_utils \
	// RUN: -shared-libs=%mlir_runner_utils \			// RUN: -shared-libs=%mlir_runner_utils \
	// RUN: \| FileCheck %s			// RUN: \| FileCheck %s

	// -------------------------------------------------------------------------- //			// -------------------------------------------------------------------------- //
	// exp2f.			// exp2f.
	▲ Show 20 Lines • Show All 424 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][VectorOps] Use SCF for vector.print and allow scalable vectorsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 545990

mlir/include/mlir/Dialect/Vector/IR/VectorOps.td

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp

mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp

mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir

mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir

mlir/test/Integration/Dialect/Arith/CPU/test-wide-int-emulation-compare-results-i16.mlir

mlir/test/Integration/Dialect/Arith/CPU/test-wide-int-emulation-constants-i16.mlir

mlir/test/Integration/Dialect/LLVMIR/CPU/X86/test-inline-asm-vector.mlir

mlir/test/Integration/Dialect/LLVMIR/CPU/test-vp-intrinsic.mlir

mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/test-sve.mlir

mlir/test/Integration/Dialect/Vector/CPU/X86Vector/test-dot.mlir

mlir/test/Integration/Dialect/Vector/CPU/X86Vector/test-mask-compress.mlir

mlir/test/Integration/Dialect/Vector/CPU/X86Vector/test-rsqrt.mlir

mlir/test/Integration/Dialect/Vector/CPU/X86Vector/test-vp2intersect-i32.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-0-d-vectors.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-broadcast.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-compress.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-constant-mask.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-contraction.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-create-mask-v4i1.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-create-mask.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-expand.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-extract-strided-slice.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-flat-transpose-col.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-flat-transpose-row.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-fma.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-gather.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-index-vectors.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-insert-strided-slice.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-maskedload.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-maskedstore.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-matrix-multiply-col.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-matrix-multiply-row.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-outerproduct-f32.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-outerproduct-i64.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-print-fp.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-print-int.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-realloc.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-reductions-f32-reassoc.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-reductions-f32.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-reductions-f64-reassoc.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-reductions-f64.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-reductions-i32.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-reductions-i4.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-reductions-i64.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-reductions-si4.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-reductions-ui4.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-scan.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-scatter.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-shape-cast.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-shuffle.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-shuffle16x16.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-sparse-dot-matvec.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-sparse-saxpy-jagged-matvec.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-transpose.mlir

mlir/test/Integration/Dialect/Vector/GPU/CUDA/test-reduction-distribute.mlir

mlir/test/Integration/Dialect/Vector/GPU/CUDA/test-warp-distribute.mlir

mlir/test/mlir-cpu-runner/math-polynomial-approx.mlir

mlir/test/mlir-cpu-runner/test-expand-math-approx.mlir

[mlir][VectorOps] Use SCF for vector.print and allow scalable vectors
ClosedPublic