- Create and move ops with ISA compatibility to arm_neon.intr.*
- Add arm_neon.intr.sdot
Details
- Reviewers
ftynse nicolasvasilache aartbik - Commits
- rGf5963944d97d: Add arm_neon.sdot operation
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
mlir/include/mlir/Dialect/ArmNeon/ArmNeon.td | ||
---|---|---|
106–108 | You need to encode this in verifier (potentially using the TypesMatchWith trait). Otherwise, the op accepts any combination, so (vector<4xi32>, vector<8xi8>, vector<8xi8>) -> vector<16xi32> is currently accepted. | |
114 | Nit: since b, c and a, res have pairwise equal types, you don't need to list all of them |
mlir/include/mlir/Dialect/ArmNeon/ArmNeon.td | ||
---|---|---|
101 | Re op semantics, we have a choice of using 1-1 mapping to ARM or making it more MLIR codegen friendly. Given the way we represent MLIR vectors, I'd much rather make the vector<4xi8> part explicit and use the ARM op as the means to hide the abstraction gap between 1-D flattened (HW-detail) and 2-D vectors (MLIR-representation): (vector<2xi32>, vector<2x4xi8>, vector<2x4xi8>) -> vector<2xi32> Now I see the issue here: vector<2x4xi8> is not a native LLVM type and it "won't just work" magically. Additionally, it is possible (likely) that we will also need better vector-level abstractions and canonicalizations for flattening / unflattening between n-D and 1-D. Also, note the vector.contract semantics is %lhs, %rhs, %acc. The intrinsics page linked in the op doc does not mandate a form (I understand the ISA does but, like intrinsics, MLIR is closer to user / programming level than HW in this case): uint8x8_t vadd_u8 (uint8x8_t a, uint8x8_t b) uint8x16_t vaddq_u8 (uint8x16_t a, uint8x16_t b) I would just represent it in the retargetable codegen-friendly way suggested above. | |
106–108 | Isn't the above enough? AllTypesMatch<["b", "c"]>, AllTypesMatch<["a", "res"]> | |
114 | I'd just make it consistent with vector.contract. |
mlir/include/mlir/Dialect/ArmNeon/ArmNeon.td | ||
---|---|---|
101 | It's MLIR, we don't have to choose :) We do want something that maps 1-1 to LLVM IR intrinsics for translation simplicity purposes. This doesn't prevent us from having a slightly higher-level op that is easier to target. So we can have arm_neon.sdot that works on 2D types _and_ an arm_neon.intr.sdot that works on 1D types and maps directly to the intrinsic plus a simple conversion between the two that flattens the vector. This isn't a no-op conversion that we used to have between the llvm and non-llvm version of the dialect as it actually inserts the flattening op. If we start having several such ops, we can automate the definition and generate conversions at the ODS level. I am leaning in a similar direction for other "intrinsic" dialects. | |
106–108 | Actually, the comment looks wrong (vector<4xi32>, vector<16xi8>, vector<16xi8>) -> vector<16xi32> wouldn't pass the AllTypesMatch<["a", "res"]> verifier. I'll revise my example to (vector<2xi32>, vector<16xi8>, vector<16xi8>) -> vector<2xi32>, which does pass the current verifier but should not. There is nothing that guarantees only co-indexed length values from VectorOfLengthAndType are chosen. |
mlir/include/mlir/Dialect/ArmNeon/ArmNeon.td | ||
---|---|---|
101 | WFM! Then I'd just suggest to make turn this op into arm_neon.intrin.sdot to signify it is an implementation detail and make another arm_neon.sdot that implements the codegen-friendly suggestion above (can be in a separate CL). | |
106–108 | Ah indeed, good catch. @asaadaldien here is an example from AVX512: def MaskRndScaleOp : AVX512_Op<"mask.rndscale", [NoSideEffect, AllTypesMatch<["src", "a", "dst"]>, TypesMatchWith<"imm has the same number of bits as elements in dst", "dst", "imm", "IntegerType::get($_self.getContext(), " "($_self.cast<VectorType>().getShape()[0]))">]> { ... |
mlir/include/mlir/Dialect/ArmNeon/ArmNeon.td | ||
---|---|---|
101 | @nicolasvasilache, If we split the dialect into arm_neon.* and arm_neon.intr.* will we need to write arm_neon.* ->arm_neon.* transformations ? I think most of these transformations if needed aren't layout specific and better to exist above at vector dialect level. The flat 1-d vector here aren't crossing vector.contract -> neon.sdot pattern boundary. | |
106–108 | Good catch @ftynse , Thanks @nicolasvasilache for the example. |
mlir/include/mlir/Dialect/ArmNeon/ArmNeon.td | ||
---|---|---|
101 | I would suggest to only have */intr dichotomy when it is strictly necessary. We will have to write in-dialect transformations. They will flatten the vectors, and I would expect the flattening to be common for different ArmNeon, but potentially different for other dialects that use vector types (e.g., GPU mmafragment is opaque and may be targeted from vectors). In this light, it makes sense to me to put the flattening in the dialect. This doesn't prevent us from having some VectorUtils that support it in a generic way and called by concrete dialects. |
mlir/include/mlir/Dialect/ArmNeon/ArmNeon.td | ||
---|---|---|
101 | If we have the arm_neon dialect operating on flattened vectors the only time we need to do this flattening is when dialect convert vector.op_x -> arm_neon.op_y, I am trying to understand why we need the dialect to exist in two groups arm_neon.op_x_with_nd_vectors and arm_neon.intr.op_with_isa_like_1d_vec ? |
mlir/include/mlir/Dialect/ArmNeon/ArmNeon.td | ||
---|---|---|
101 | We get two trivial conversions: vector.op_x_with_nd_vectors to arm_neon.op_x_with_nd_vectors followed by arm_neon.op_x_with_nd_vectors to arm_neon.op_x_with_1d_vectors - instead of one larger, non-trivial conversion. Having an nD abstraction also helps if we want to program at a level slightly above the intrinsics but not at vector dialect level, e.g., manual performance benchmarking. |
mlir/include/mlir/Dialect/ArmNeon/ArmNeon.td | ||
---|---|---|
101 | Added arm_neon.inter.* to ops we have so far.. Thanks @ftynse , @nicolasvasilache rethinking about I can see how useful the break down is: e.g vector.* -> arm_neon.op* dialect conversion can be done independent of vector length, arm.* -> arm_neon.intr.op* pattern rewrite is the part that is ISA aware and can have other ISA specific transformations (e.g vector-padding). |
nit: OneResult is a bit more consistent with naming above