The way vector.create_mask is currently lowered is
vector-length-dependent, and therefore incompatible with scalable vector
types. This patch adds an alternative lowering path for create_mask
operations that return a scalable vector mask.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Please block the patch for now, this is just something I'm thinking about, I'm also opening a post in discourse to get some feedback.
Fix create mask folder. Move create_mask on scalable vectors from
VectorTransforms to ConvertVectorToLLVM
Switching to llvm.intr.experimental.stepvector for lowering. This fixes
consistency issues with fixed-length create_mask operations.
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp | ||
---|---|---|
918–919 | this optimisation seems odd, I can't imagine there's any hardware out there with vectors approaching 2^64 elements (and 2^32 for that matter). Can this option be removed and always default i32? | |
922 | getScalableVectorType? | |
927 | I know this is based on the fixed lowering, but I wonder if this should be ult. |
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp | ||
---|---|---|
918–919 | The default is i32 (i.e.: optimize indices), but the fixed-length create_mask operation supports i64 indices so I believe it's only fair, for consistency, to support them for scalable vectors as well. Is it worth removing the option altogether? probably, but that should be a different patch :-) | |
922 | It's broken and I'm not sure how to fix it. It's in the backlog. | |
927 | If you do ult you have the same "wrap around" problem. It has to be signed in case the index is negative. There's a discussion here: [[ https://reviews.llvm.org/D116069 | [mlir][vector] Allow values outside of [0; dim-size] in create_mask]] about why this is the preferred behavior. |
Does this need a round-trip test in mlir/test/Dialect/Vector/ops.mlir?
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp | ||
---|---|---|
918–919 |
I agree, better to be consistent and remove in another patch. Just an observation :) | |
927 |
the vector.create_mask -> vector.constant_mask canonicalization for negative values should happen before this lowering? |
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp | ||
---|---|---|
927 | I've included the right flow and constrains for create_mask -> constant_mask canonicalization. |
mlir/include/mlir/Dialect/Vector/Transforms/VectorTransforms.h | ||
---|---|---|
94 | nit: an | |
mlir/lib/Dialect/Vector/IR/VectorOps.cpp | ||
4234 | == 0? Or to be 0? https://mlir.llvm.org/docs/Dialects/Vector/#vectorconstant_mask-mlirvectorconstantmaskop
| |
mlir/test/Dialect/Vector/ops.mlir | ||
375 | nit: unrelated change |
LGTM, but I'm pretty new around these parts so I'll leave it to another reviewer to accept
@ftynse Hi Alex, I just wanted to kindly remind you about this patch, after the discussion in discourse nobody else seems to have anything against it, and it's currently blocking a stack of approved patches. Thank you!
LGTM, let's wait for Alex.
Few remaining nits
mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td | ||
---|---|---|
1758 | period at end | |
mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp | ||
615 | period at end | |
mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir | ||
1466 | although this will match genbool_1d_scalable is a better LABEL! |
Thanks for the contribution, Javier! Some comments inline.
mlir/include/mlir/Conversion/VectorToLLVM/ConvertVectorToLLVM.h | ||
---|---|---|
68 | Add doc about what indexOptimizations is actually enabling? | |
mlir/include/mlir/Dialect/Vector/Transforms/VectorTransforms.h | ||
99 | Now that we are making this utility public, I wonder if it would make more sense to move it somewhere else since it's not a vector specific transformation, right? Maybe we could move it to the builder class? It looks like a builder method to me. | |
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp | ||
904 | Would it make sense to align this lowering with the non-scalable version (i.e., moving it to Vector Transforms)? I think create_mask is a relatively high level op that would make sense to lower to something simpler before we lower it to LLVM. That would align with the approach that we follow for similar vector ops and would make the LLVM lowering simpler (which is complex already). (Hopefully I'm not missing any context. Feel free to ignore this comment you already discussed this). | |
918–919 | +1 to removing this. It's a bit odd that we shrink the index type as part of the lowering. I think that kind of transformation should happen as a separate step before the lowering to LLVM. |
Address reviewer comments
mlir/include/mlir/Dialect/Vector/Transforms/VectorTransforms.h | ||
---|---|---|
99 | That creates an awkward dependency between common MLIR and the Arithmetic dialect, but it does make a lot of sense to move this to Arithmetic Utils (I've also renamed it to match similar utilities). | |
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp | ||
904 | The reason to separate them was that the scalable lowering depends on LLVM IR much earlier than the fixed-length version. In a way, create_mask for scalable vectors is a lower level operations than create_mask for fixed-length vectors. If we move scalable create_mask to Vector Transforms, we add an additional dependency there. If we want to unify both lowerings, I think it would make more sense to move create_mask to conversions. There was some discussion in the thread I created for this, and the conclusion was that having two lowerings for one operation was not an issue. If that not the case any more or you think it doesn't apply to this case, I'm happy to move things around. It's a trivial change :-) |
mlir/include/mlir/Conversion/VectorToLLVM/ConvertVectorToLLVM.h | ||
---|---|---|
66 | Why not call it assume32BitIndices then? | |
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp | ||
910 | Nit: no need for explicit mlir:: here. | |
mlir/lib/Dialect/Vector/IR/VectorOps.cpp | ||
4278–4279 | Use matchPattern(m_constantInt(...)) instead of explicitly matching for arith.constant here. |
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp | ||
---|---|---|
918–919 | Do you all know that the bitwidth of index is parameterized through the data layout mechanism? You can also define this as a full-fledged ConversionPattern, get hold of a TypeConverter instance and make it convert the actual IndexType to get you the integer type of the expected bitwidth. Anything else will run into type mismatches sooner or later. |
mlir/include/mlir/Conversion/VectorToLLVM/ConvertVectorToLLVM.h | ||
---|---|---|
66 | The only reason is that it is the name used elsewhere, but I agree it's not a great name. Once this has landed, I will push a NFC patch changing the name everywhere, probably to something like assume32BitVectorIndices to avoid confusion. | |
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp | ||
918–919 | I believe this is handled independently from that mechanism because you might want to have IndexType to i64 conversion, and yet generate i32 indices in this very particular case. While your loop indices might go well beyond 2^32, the length of your physical vector is unlikely to do so. |
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp | ||
---|---|---|
904 | If it was discussed already, it's fine then! We can align both lowerings in the future. We may have a better picture once we introduce further masking support.
What is missing on the MLIR side to be able to do the lowering? StepVectorOp? |
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp | ||
---|---|---|
904 | StepVectorOp would unify the lowering, but, unless we find more use cases, I'm not sure its worth the change. The documentation of llvm.experimental.stepvector says that the recommended way to do this for fixed-length vectors is by using a constant vector. This would lead to a unification on the lowering of create_mask, but then a dual lowering for StepVectorOp (to the intrinsic or the constant vector, depending on scalability of result). I'd say it's just marginally better, if that. If as we expand mask creation and manipulation in MLIR we find this op is useful, I'll take care of unifying the lowering of this op. |
Why not call it assume32BitIndices then?