This is an archive of the discontinued LLVM Phabricator instance.

[mlir][llvm] Add vector insert/extract intrinsics
ClosedPublic

Authored by jsetoain on Jun 6 2022, 4:05 AM.

Details

Summary

These intrinsics will be needed to convert between fixed-length vectors
and scalable vectors.

This operation will be needed for VLS (vector-length specific)
vectorization, when interfacing with vector functions or intrinsics that
take scalable vectors as operands in a context where the length of our
vectors is known or assumed at compile time, but we still want to
generate scalable vector instructions.

Diff Detail

Event Timeline

jsetoain created this revision.Jun 6 2022, 4:05 AM
Herald added a project: Restricted Project. · View Herald Transcript
jsetoain requested review of this revision.Jun 6 2022, 4:05 AM
jsetoain updated this revision to Diff 434448.Jun 6 2022, 5:43 AM

Bad merge

LGTM! A couple of comments

mlir/include/mlir/Dialect/LLVMIR/LLVMIntrinsicOps.td
440
%0 = llvm.intr.experimental.vector.insert %arg0, %arg2[0] : vector<8xf32> into vector<[4]xf32>

I didn't know we could insert/extract larger multiples of the fixed part of the scalable type. That's interesting.
I think we should add custom verifiers to make sure the fixed type is the same as the fixed part of the scalable type or a multiple of it. I guess we should also check for the upper bound of these multiples based on the variants defined in LLVM.

jsetoain added inline comments.Jun 7 2022, 2:26 AM
mlir/include/mlir/Dialect/LLVMIR/LLVMIntrinsicOps.td
440

Yes! The idea is to be able to interface VLS code with VLA functions. I'm writing another patch for shape_cast that will verify that the cast makes sense. The easiest restriction, for instance, is that the fixed-length vector must be a multiple of the scalable one, but what "a multiple" means gets complicated quickly if we think about multi-rank vectors (I'm working on it). As for checking the upper bound, since that's going to be architecture-dependent and we need this to be a bit higher level, I believe we should defer that to the architecture-dependent stages of the pipeline. The whole operation is already treading on a thin layer of unverifiability, we are assuming the programmer/compiler know what they're doing, so it should be acceptable. I want to elaborate on all these things (and more) in the post I promised, I just need to finish the shape_cast patch so we can discuss on the basis of a use case.

dcaballe added inline comments.Jun 7 2022, 12:05 PM
mlir/include/mlir/Dialect/LLVMIR/LLVMIntrinsicOps.td
440

Take your time! In the meantime, let me clarify and ask a few questions:

  1. I assume that the shape_cast work and the multi-rank vector variants of the insert/extract operations would go to the Vector dialect, not to the LLVM dialect, right? Multi-dimensional vectors should have been unrolled or lowered to multi-dimensional arrays.
  2. Re verification (upper bound and fixed-scalable vector properties), the intrinsics in the LLVM dialect should faithfully represent the ones defined in LLVM. If we generated a variant that is not defined in LLVM (let's say <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.v40960i32(<vscale x 4 x i32>, <40960 x i32>, i64 immarg), we shouldn't accept it in the LLVM dialect because we would generate invalid LLVM IR. I think we should only allow valid variants in the LLVM dialect (i.e., variants that are defined in LLVM). Any transformations aimed at "legalizing" high-level versions of these ops should happen before reaching the LLVM dialect.

Perhaps I'm missing something... :)

jsetoain added inline comments.Jun 8 2022, 2:01 AM
mlir/include/mlir/Dialect/LLVMIR/LLVMIntrinsicOps.td
440

Let me elaborate a bit. The idea is to lower "non-trivial" fl->sv/sv->fl shape_casts to fl->fl/sv->sv "non-trivial" shape_cast plus a "trivial" fl->sv/sv->fl.

fl->fl and sv->sv are lowered as they do now (down to a series of insert/extracts, and then to LLVM), but the "trivial" fl->sv/sv->fl are replaced on the vector->llvm lowering by one of the vector.insert/extract intrinsics. There are other alternatives, but I think this one is the best.

By "trivial", I mean things like vector<4x4xf32> to vector<[4]xf32> or vector<4xf32> (which is equivalent to vector<1x4xf32>) to vector<[4]xf32> (and vice-versa). I call these trivial because a vector<[4]xf32> is equivalent to a vector<n x 4 x f32> where n is determined at runtime, hence things like vector<n x shape x type> are trivially "mappable" to a vector<[shape] x type> if we assume that the bitsize of your runtime vector is n x sizeof(type).

But we also need to be able to cast something like vector<2x8xi8> to vector<[16]xi8> by doing a cast from vector<2x8xi8> to vector<16xi8> and another (trivial) one from vector<16xi8> to vector<[16]xi8>; or the 256b VLS equivalent: vector<2x2x8xi8> to vector<[16]xi8> through vector<2x16xi8>. This would allow to interface VLS GEMM code with xMMLA SVE intrinsics, for instance.

Re.: 2, after doing a few tests, it looks like, indeed, vector.insert/extract have a limit to the fixed size of 2^17 bits. I don't understand why but, since it's there, I agree we need to add verification to avoid producing invalid LLVM Dialect that can't be translated to LLVM IR. It was I who was missing something :-P

Since it's taking so long and this is probably the worst place to have this discussion, I will open a discourse thread and we can talk about this over there :-)

jsetoain updated this revision to Diff 437151.EditedJun 15 2022, 7:13 AM

Accept fixed-length insert/extract

bsmith added a subscriber: bsmith.Jun 16 2022, 8:22 AM

Hopefully soon the vector.insert/extract intrinsics will be moving out of the experimental namespace (https://reviews.llvm.org/D127976). Just a heads up, depending on which patch lands first.

Brilliant! Thanks for heads up, Bradley!

jsetoain updated this revision to Diff 437610.Jun 16 2022, 11:01 AM

Adding verification of vector sizes, allowing more modes of operation.

jsetoain marked an inline comment as done.Jun 16 2022, 11:04 AM

I believe this change addresses the issue with vectors being too long. I also changed the constraints to match those of the LLVM intrinsic.

jsetoain marked an inline comment as done.Jun 16 2022, 11:04 AM
jsetoain updated this revision to Diff 437676.Jun 16 2022, 1:03 PM

Add missing constraints.

Matt added a subscriber: Matt.Jun 16 2022, 4:45 PM
dcaballe accepted this revision.Jun 17 2022, 12:13 PM

LGTM! Thank you so much for addressing the comments and for the discussion!

This revision is now accepted and ready to land.Jun 17 2022, 12:13 PM
jsetoain updated this revision to Diff 440173.Jun 27 2022, 4:55 AM

Rebase on top of main with LLVM vector.insert/extract outside of experimental

This revision was automatically updated to reflect the committed changes.