This is an archive of the discontinued LLVM Phabricator instance.

[mlir][VectorOps] Use SCF for vector.print and allow scalable vectors
ClosedPublic

Authored by benmxwl-arm on Jul 28 2023, 3:48 AM.

Details

Summary

This patch splits the lowering of vector.print into first converting
an n-D print into a loop of scalar prints of the elements, then a second
pass that converts those scalar prints into the runtime calls. The
former is done in VectorToSCF and the latter in VectorToLLVM.

The main reason for this is to allow printing scalable vector types,
which are not possible to fully unroll at compile time, though this
also avoids fully unrolling very large vectors.

To allow VectorToSCF to add the necessary punctuation between vectors
and elements, a "punctuation" attribute has been added to vector.print.
This abstracts calling the runtime functions such as printNewline(),
without leaking the LLVM details into the higher abstraction levels.
For example:

vector.print <comma>

lowers to

llvm.call @printComma() : () -> ()

The output format and runtime functions remain the same, which avoids
the need to alter a large number of tests (aside from the pipelines).

Diff Detail

Event Timeline

benmxwl-arm created this revision.Jul 28 2023, 3:48 AM
Herald added a project: Restricted Project. · View Herald Transcript
benmxwl-arm requested review of this revision.Jul 28 2023, 3:48 AM

Rebase, correct comments, and update some missed tests.

benmxwl-arm edited the summary of this revision. (Show Details)Jul 28 2023, 5:38 AM
benmxwl-arm edited the summary of this revision. (Show Details)
kuhar added a comment.EditedJul 28 2023, 12:01 PM

I can see one test failure:

FAIL: MLIR :: Integration/Dialect/LLVMIR/CPU/X86/test-inline-asm-vector.mlir (210 of 2225)
******************** TEST 'MLIR :: Integration/Dialect/LLVMIR/CPU/X86/test-inline-asm-vector.mlir' FAILED ********************
Script:
--
: 'RUN: at line 1';   /llvm/relass/bin/mlir-opt /llvm/llvm-project/mlir/test/Integration/Dialect/LLVMIR/CPU/X86/test-inline-asm-vector.mlir -convert-vector-to-llvm |   /llvm/relass/bin/mlir-cpu-runner -e entry_point_with_all_constants -entry-point-result=void    -shared-libs=/llvm/relass/lib/libmlir_c_runner_utils.so
--
Exit Code: 1

Command Output (stderr):
--
loc("<stdin>":4:5): error: Dialect `vector' not found for custom op 'vector.print' 
could not parse the input IR
c-rhodes accepted this revision.Jul 31 2023, 2:58 AM

Thanks for patch the Ben, this will improve tests like the one added in D155839 and mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir that print scalable vectors. I've left a few minor comments but otherwise LGTM, aside from the test failure @kuhar mentioned. Please allow time for others to review.

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
1420–1429

nit: splits

mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp
716

nit: a comment here like "shape cast to rank 1" would be helpful

mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir
566

nit: actual names like IS_NOT_LAST_ITER would be helpful in the tests

This revision is now accepted and ready to land.Jul 31 2023, 2:58 AM
  • Fix test-inline-asm-vector.mlir (thanks for the pointer)
  • Always extend odd integer sizes (non-pow2 or < i8) to avoid backend issues
  • Use actual names in tests
  • Fix nits
benmxwl-arm marked 3 inline comments as done.Jul 31 2023, 7:17 AM

Thanks for working on this @benmxwl-arm, this is a very nice and a very welcome improvement for scalable vectors!

a "punctuation" attribute has been added to vector.print.
This abstracts calling the runtime functions such as printNewline(),
without leaking the LLVM details into the higher abstraction levels.
For example:

vector.print <comma>

lowers to

lvm.call @printComma() : () -> ()

While this avoids leaking the LLVM details higher up, it requires vector.printto evolve into a bit of a "Swiss knife" that can print Vector(s) and other things too :) It would be good to get an opinion from vector.print veterans - perhaps @aartbik or @rriddle ?

mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
2483

Why Optional? So that vector.print can print punctuation without anything else?

mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp
675

Why not llvm.call @printComma() : () -> ()?

mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir
623

How about 2d scalable vectors?

mlir/test/Integration/Dialect/LLVMIR/CPU/X86/test-inline-asm-vector.mlir
6

Which conversion fails if you keep this as llvm.func?

mlir/test/Integration/Dialect/Vector/CPU/test-compress.mlir
1

This is a bit unrelated. We should either switch all tests to -test-lower-to-llvm or none. IMO, it would be safer to keep the original pipeline. Switching to -test-lower-to-llvm could be proposed separately.

While this avoids leaking the LLVM details higher up, it requires vector.printto evolve into a bit of a "Swiss knife" that can print Vector(s) and other things too :) It would be good to get an opinion from vector.print veterans - perhaps aartbik or rriddle ?

Note that vector.print already is a bit of a "swiss knife" as it supports printing scalars already (and is used for non-vector things), so it does not seem like a huge stretch :)

mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
2483

Yes, it has to be able be used alone for the VectorToSCF pass, see the test outputs :)

mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp
675

Because that'd be introducing LLVM specifics before lowering to LLVM, which does not seem right (and is not done elsewhere).

mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir
623

I could add a test for the SCF output here (which I believe is valid), though currently >= 2D scalables can't be lowered any further.

mlir/test/Integration/Dialect/LLVMIR/CPU/X86/test-inline-asm-vector.mlir
6

I may be lowering it incorrectly, but it gets stuck with some index types and cf.brs that fail to lower to LLVM following something like:
-convert-vector-to-scf -convert-scf-to-cf -convert-cf-to-llvm -convert-vector-to-llvm -convert-arith-to-llvm -reconcile-unrealized-casts

mlir/test/Integration/Dialect/Vector/CPU/test-compress.mlir
1

It's mainly because these tests use memrefs now, and hand-rolling the pipeline is quite unwieldy (and the original can't be kept anyway).

Note that if I removed the punctuation attribute entire lowering would need to be done in VectorToSCF directly calling the runtime functions, since individual calls to vector.print would add newlines after each element (currently disabled via the punctuation attribute), which would be incorrect when not printing a scalar.

Thanks for adding this. I am quite happy that this (1) support scalable vectors now and (2) the loops instead of unrolling will actually avoid some of the code size issues we were seeing.
I would like to see a bit more explanation in the vector op documentation itself, please.

mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
2483

It feels like the documentation needs to be updated a bit more on this. We now, if I understand correctly, have the "pure vector print that still implies punctuation" and then
after lowering, we get into the semantics of printing the punctuation through the attribute and scalars. I understand why you picked this (not having the llvm dep early), but it is something that could use some explanation

2497–2499

can we add another example with punctuation also now?

2500–2501

broken is a bit ambigous, can we use something like decomposed or so?

mlir/include/mlir/IR/BuiltinTypes.td
378 ↗(On Diff #545643)

this feels like it should be in a separate CL

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
1419–1420

Now that you are here, this is no longer proof-of-concept, but simply "Lowering implementation ...."

mlir/test/Integration/Dialect/Vector/CPU/test-transfer-read-1d.mlir
1 ↗(On Diff #545643)

this seems to repeat the pass?

  • Update vector.print documentation
  • Remove changes to BuiltinTypes.td
  • Add 2D scalable test case in vector-to-scf.mlir
  • Address miscellaneous comments :)
benmxwl-arm marked 7 inline comments as done.Aug 1 2023, 3:25 AM
aartbik accepted this revision.Aug 1 2023, 12:41 PM
aartbik added inline comments.
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
1441

since you are not using the result of the cast ,just use isa?

awarzynski added inline comments.Aug 2 2023, 1:50 AM
mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir
623

To me that's a good reason to completely disallow 2D scalables. If we are not going to use it, it's effectively dead code 🤔 . This way we are much clearer about what's supported and what isn't.

benmxwl-arm added inline comments.Aug 2 2023, 2:18 AM
mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir
623

It does not require extra code to support here (it comes as a part of already supporting n-D vectors + scalable vectors). It's just that currently 2D+ scalable vectors can't be lowered to LLVM, so the lowering actually fails before any specific to vector.print (e.g. at the arith.constant dense<0.0> : vector<[4]x[4]xf32> in the test case).

awarzynski accepted this revision.Aug 2 2023, 7:09 AM

LGTM, thanks! (% request for a few more comments)

mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp
682–683

I am realising that "dynamic" may mean different thing depending on the context. I usually think of "dynamically" vs "statically" shaped tensors/memrefs. But that's not what you had in mind, is it? Could you elaborate a bit? In particular, what the actual limitation is? With e.g. references to vector.extract vs vector.extractelement?

You may also want to refer to:

mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir
623

This is a good point, but I would still be hesitant to allow vector.print %arg0 : vector<[4]x[4]xf32> anywhere. We know that it will never be lowered to anything useful (well, there's nothing on the horizon). In cases like this I try to follow the principle of least surprises :)

Having said that, I appreciate that it's a bit weird to impose such limitations at the Vector dialect level. Please go with whichever approach you prefer.

benmxwl-arm updated this revision to Diff 546863.EditedAug 3 2023, 7:38 AM

Thanks for the reviews and approvals, however, I've decided to rework the patch a little.
Rather than have the spill to a memref, I now do a simple reshape of n-D vectors (which then allows indexing with SSA values via vector.extractelement).
This explicitly does not support 2D scalable vectors, but as those can't be lowered currently this is no loss.

The main advantage of this is it keeps the lowering pipeline simple, so I've now removed the few -test-lower-to-llvms I added.

This did however uncover a bug in the vector.extract folds, so this patch now depends on D157003.

  • Fix an accidental use-after-free
  • Rebase on main
awarzynski accepted this revision.Aug 8 2023, 2:50 AM

Thanks for the reviews and approvals, however, I've decided to rework the patch a little.

Fantastic refactor, thank you Ben! This simplifies your original implementation and makes it less intrusive (thinking about the tests). I've left one small comment, but otherwise re-LGTM!

Btw, as this latest update doesn't really affect the overall direction/logic (which has already been approved by 3 reviewers), I think that it's fine to land it without waiting for further re-approvals ;-)

mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp
721–728

I would bump this to the very top. In general, it's good to have this sort of assumptions documented and enforced quite early on (as opposed mixed with regular code). Not always possible, but should be fine in this case.

benmxwl-arm marked an inline comment as done.Aug 9 2023, 3:00 AM

Reverted due to test failures in the MLIR python bindings: https://lab.llvm.org/buildbot/#/builders/220/builds/25791
Will address those before re-landing.

Relanded after updating the few test failures I could find (Python bindings and a few CUDA/GPU tests), I've tried to go over all the tests I can find (though I am mindful there's hardware tested I can't run).

mehdi_amini added a comment.EditedAug 9 2023, 7:38 PM

Reverted because of the broken test, you tried to fix it as a follow up but that didn't work: it actually made it worse. The fix wasn't correct anyway, the problem is a parsing error where vector.print <open> wouldn't parse back.

$ echo 'vector.print <open>' | bin/mlir-opt
<stdin>:1:13: error: expected operation name in quotes
vector.print <open>
            ^
mehdi_amini added inline comments.Aug 9 2023, 7:56 PM
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
1436–1437

There isn't necessarily a parent module op I believe.

Sorry for the inconvenience and thanks for reverting (sorry I missed the continued failures).

I see the issue with the ambiguous assembly format and have a fix for that.
I'm just now trying to ensure I can at least run the pipeline for gpu-to-cubin locally to verify it will build.

Hi @mehdi_amini,

Reverted because of the broken test

Sorry about the inconvenience.

We are actually struggling to reproduce the failure in gpu-to-cubin.mlir. These CUDA integration tests use pipelines (e.g. gpu-to-cubin) that just won't build without CUDA runtime/drivers (which we don't have). Perhaps we're missing something obvious, but so far this is proving quite tricky.

Do you know anyone able to and willing to share an mlir-print-ir-after-all dump with us? :) Perhaps there are some instructions somewhere on how to run these tests using free tools? Just to clarify - we only want to/need to compile these tests. We don't need to run them (i.e. with mlir-cpu-runner).

@benmxwl-arm will shortly update this PR with a fix for the ASM format, but "gpu-to-cubin.mlir" is likely to continue failing. I'm not sure how to resolve that without some external help.

benmxwl-arm reopened this revision.Aug 10 2023, 7:00 AM
This revision is now accepted and ready to land.Aug 10 2023, 7:00 AM
  • Check parent module exists (required for print, note that getting the parent this way is not changed from the previous implementation)
  • Ensure assembly format is round-trippable
  • Best guess fix for gpu-to-cubin.mlir (no luck testing this locally)
mehdi_amini added a comment.EditedAug 11 2023, 12:39 AM

We are actually struggling to reproduce the failure in gpu-to-cubin.mlir. These CUDA integration tests use pipelines (e.g. gpu-to-cubin) that just won't build without CUDA runtime/drivers (which we don't have). Perhaps we're missing something obvious, but so far this is proving quite tricky.

You can't run them, however you should be able to compile everything (assuming you enabled the NVPTX target).
You can just copy the test and invoke it from the regular test locations (the Integration Cuda test will be filtered if you don't have Cuda) or just call mlir-opt directly, for example there shouldn't be any issue doing:

bin/mlir-opt /mlir/test/Integration/GPU/CUDA/gpu-to-cubin.mlir  \
| bin/mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm,gpu-to-cubin))' \
| bin/mlir-opt -gpu-to-llvm

(this the test RUN description)

Patched this revision and ran locally:

PASS: MLIR :: Integration/GPU/CUDA/gpu-to-cubin.mlir (166 of 2066)

Just try to push at a time you can keep an eye on the bot if you wanna be safe!

Sorry for the inconvenience and thanks for reverting (sorry I missed the continued failures).

It's very easy to miss, because the bot won't send a notification when it is already broken, only when it goes from green to red.

benmxwl-arm added a comment.EditedAug 11 2023, 2:10 AM

Thanks a lot for the help, and checking that test! I'll definitely be keeping a close eye this time.

The extra complication with the gpu-to-cubin pass is that it's gated under MLIR_ENABLE_CUDA_RUNNER and links with the CUDA libraries, which we didn't have.

It's very easy to miss, because the bot won't send a notification when it is already broken, only when it goes from green to red.

See that now 😅, I was expecting a notification.