Here we enable some of shuffle lowering with TBL instruction with SVE and SVE2 for indexing for one register and TBL version for SVE2 while indexing to both registers.
Details
Diff Detail
Event Timeline
Hi @dtemirbulatov, I only had a glance so haven't reviewed this properly yet, but already left you a few comments and questions.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
25910–25911 | This could do with a code comment describing why we don't do this when we don't know the exact SVE size. I think there are several cases to distinguish here:
<2 x i64> %x, <2 x i64> %y, <4 x i32> <0, 1, 2, 3> if SVE VL = 256, then we'd basically insert the <2 x i64> vector value into a conceptual <4 x i64> register where only the first two lanes are relevant, so the offsets must be: <0, 1, 4, 5> if SVE VL = 512, then we'd basically insert the <2 x i64> vector value into a conceptual <8 x i64> register where only the first two lanes are relevant, so the offsets must be: <0, 1, 8, 9> So we'd somehow need to do something like this: <0, 1, 2, 3> + <0, 0, VL, VL> | |
25911 | nit: tab? | |
25913 | It's not really clear to me what Swap and IsUndefOrZero are supposed to do. I would have expected it to look at the indices rather than the source vector operands. Can you add a comment explaining this logic? | |
25956–25967 | nit: what about this? | |
llvm/test/CodeGen/AArch64/sve-fixed-length-permute-rev.ll | ||
396–397 | Seems like these changes can be committed separately. | |
llvm/test/CodeGen/AArch64/sve-fixed-length-vector-shuffle.ll | ||
944 | This needs some check lines for the constant. | |
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-vector-shuffle.ll | ||
388 | Can you make it somehow clear form the name of the function, or by inlining the attribute here (e.g. @shuffle_v4i16_tbl2(ptr %a, ptr %b) "target-features"="+sve2" {) that this relates to SVE2? |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
25913 | Ok, I will comment for those variables. Also, for example. If the mask is refers to only one operand then second operand would be already undefined at the time of lowering. I think we don't have to check the mask here. |
Hi Dinar,
I re-wrote your code in the Lowering to something I could understand better.
I will still spend some time trying to understand the mask logic and see if the new scalable vector is correct.
But I believe it is good to share my comments for now
+ // Avoid producing TBL and TBL2 instructions if we don't know + // SVE register size or minimal is unequal to maximum size. + if (!MaxSVESize || MinSVESize != MaxSVESize) + return SDValue(); - return SDValue(); + // Element type is ??? + unsigned BitsPerElt = VT.getVectorElementType().getSizeInBits(); + if (BitsPerElt != 8 && BitsPerElt != 16 && BitsPerElt != 32 && + BitsPerElt != 64) + return SDValue(); + + bool Swap = false; + if (Op1.isUndef() || isZerosVector(Op1.getNode())) { + // Swapped shuffle operands, but we have to recalculate indexes afterwards. + std::swap(Op1, Op2); + Swap = true; + } + + + bool IsUndefOrZero = Op2.isUndef() || isZerosVector(Op2.getNode()); + unsigned NumElts = VT.getVectorNumElements(); + unsigned IndexLen = MinSVESize / BitsPerElt; + unsigned FillElements = IndexLen - NumElts; + + // Set the Mask with the correct indexes + SmallVector<SDValue, 8> TBLMask; + for (int Val : ShuffleMask) { + unsigned Offset = Val; + if (Swap) + Offset = Offset < NumElts ? Offset + NumElts : Offset - NumElts; + // why we need to set the mask to 255 when one of the vectors is zero/undef? + else if(Offset >= NumElts){ + if(IsUndefOrZero) + Offset = 255; + else if (IndexLen > NumElts) + Offset += IndexLen - NumElts; + } + TBLMask.push_back(DAG.getConstant(Offset, DL, MVT::i64)); + } + // Filling the rest of the mask with 255 + for (unsigned i = 0; i < FillElements; ++i) + TBLMask.push_back(DAG.getConstant(255, DL, MVT::i64)); + + // Building a scalable vector for the suffle. So we can use the sve instructions + EVT MaskEltType = EVT::getIntegerVT(*DAG.getContext(), BitsPerElt); + EVT MaskType = EVT::getVectorVT(*DAG.getContext(), MaskEltType, IndexLen); + EVT MaskContainerVT = getContainerForFixedLengthVector(DAG, MaskType); + SDValue VecMask = + DAG.getBuildVector(MaskType, DL, ArrayRef(TBLMask.data(), IndexLen)); + SDValue SVEMask = convertToScalableVector(DAG, MaskContainerVT, VecMask); + + SDValue Shuffle; + if (IsUndefOrZero || Swap) { + Shuffle = convertFromScalableVector( + DAG, VT, + DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, ContainerVT, + DAG.getConstant(Intrinsic::aarch64_sve_tbl, DL, MVT::i32), + Op1, SVEMask)); + } else { + if (Subtarget->hasSVE2()) + Shuffle = convertFromScalableVector( + DAG, VT, + DAG.getNode( + ISD::INTRINSIC_WO_CHAIN, DL, ContainerVT, + DAG.getConstant(Intrinsic::aarch64_sve_tbl2, DL, MVT::i32), Op1, + Op2, SVEMask)); + else + return SDValue(); + } + return DAG.getNode(ISD::BITCAST, DL, Op.getValueType(), Shuffle);
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
25927–25928 | Why do we need to test this? What other value it can have? | |
25940 | For me this tests is a bit odd. Maybe this will help to understand if there is any hidden problem with these ifs. I re-wrote that to something like: // Set the Mask with the correct indexes SmallVector<SDValue, 8> TBLMask; for (int Val : ShuffleMask) { unsigned Offset = Val; if (Swap) Offset = Offset < NumElts ? Offset + NumElts : Offset - NumElts; // why we need to set the mask to 255 when one of the vectors is zero/undef? else if(Offset >= NumElts){ if(IsUndefOrZero) Offset = 255; else if (IndexLen > NumElts) Offset += IndexLen - NumElts; } TBLMask.push_back(DAG.getConstant(Offset, DL, MVT::i64)); } | |
25944 | I would suggest add a comment here. | |
25947 | Are you here changing a fixed vector to a scalable vector to use tbl? |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
25910–25911 | I will address run-time index construction in the following patch. | |
25927–25928 | yes, The element type chack was already done in useSVEForFixedLengthVectorVT() function. | |
25947 | Right, as those instructions accept only scalable type operand and result. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
25940 | Thanks for finding this, I think now that "IndexLen> NumElts" logic/transform was not correct. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
25913 | There is an implicit assumption that a DAGCombine (or perhaps an InstCombine) has rewritten the second operand to be undef. It seems safer to look at the values in the Mask instead. | |
25929 | I think you need to call this Op2IsUndefOrZero | |
25941–25945 | What should happen if the vector type is <256 x i8> and IsUndefOrZero is false ? | |
25948–25949 | I'm curious, why is that better? Also, as my comment above suggests, for <256 x i8> vectors, 255 is just the last element of the first data-vector. There is also some test-coverage missing here. | |
25959 | I think this condition can be removed? (if it has swapped, then the second vector must be zero or undef?) | |
25964 | What is the type of VecMask here? How can the type of VecMask here (single-vector TBL instruction) be the same as for the case of the SVE2 two-vector TBL instruction? | |
25965–25966 | nit: } else if (Subtarget->hasSVE2()) { ... } else return SDValue(); ? |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
25948–25949 | I think that if we leave uninitialized remaining out of scope indexes then they will end up with 0 value and means extra work since 0 element almost always means 0 lane and with 255 value probably hardware could understand this is out of scope value, except <256 x i8> case which a it rarely than valid 0 index. The last available value for index here is 255 then it flips to 0. | |
25959 | No, IsUndefOrZero tells us that only operand 0 is used. I will add a test and rename the variable. | |
25964 | This is not correct, According to specification index size is similar for TBL, TBL2. |
Rebased, fixed remarks, added ability to avoid to produce tbl, tbl2 instruction if index is not representable index value more than 255.
Corrected sve-streaming-mode-fixed-length-vector-shuffle.ll test by removed unused parameters from functions.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
25910–25911 | It probably makes sense to move this functionality into its own function. | |
25923 | You seem to have removed checks to ask if Op1 is poison/undef or zeroinitializer. The way you've written the code now, Op1IsUndefOrZero should be renamed instead to Op1IsUnused. (same for Op2IsUnused) | |
25930 | Is the condition && (Val < NumElts * 2) needed? This seems already covered by the if (NumElts > IndexLen) check above on line 25844. | |
25933 | It's probably easier to update the ShuffleMask here. That simplifies the logic below as it avoids having to constantly distinguish between the Op*IsUndefOrZero cases. | |
25937 | Hardcoding a check for '128' doesn't seem right to me. It all depends on what can be represented in the element type of the instruction, i.e. for i8's the max value is 255. That means when targeting vscale_range(16, 16), then 255 is a valid index into the first source vector. | |
25938 | It doesn't seem right to look at the number of indices in the mask. I think rather it should be about the maximum index value in the mask? | |
25948–25949 | While it may be rare, it must be correct for all cases. If the LLVM IR instruction requests an element from a zeroinitializer vector is selected, then it must make sure to do so. That means that 255 may not always be the correct value. Maybe it should just return SDValue() for that case, or explicitly materialize a zeroinitializer vector to select the value from. | |
25955 | While not wrong, I don't really like '255' because it is a hard-coded value. For i8 elements, it may well be a valid index. For i16 or larger elements, there's no need to hard-code 255. Maybe just use '(1 << elementSize) - 1'? | |
25965 | For a shufflevector that selects IndexLen elements from Op1/Op2, this code seems to be filling the first VL - IndexLen elements with some initializer (255 in this case). It seems odd to me that this loop starts at 0 and I doubt this has the desired effect. I'm not sure to what extend it matters what value we use for 'undef' lanes. Can we just initialize the TBLMask with all numeric_limits<type>::max index value? | |
25980–25982 | Please write this as: } else if (Subtarget->hasSVE2()) { ... } else return SDValue(); |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
25930 | yes, we need this check to understand that we just electing from operand 2 and operand 1 is unused. | |
25933 | hmm, ShuffleMask is readonly variable. | |
25937 | yes and for second source operand the index starts from 256, so that is what we are checking here is we have to address to both operands then it is going to be 255 + index and we already established that both operands involved. | |
25938 | We already checked all values in the mask. |
Resolved remained remarks, added support for half sized mask, N-sized mask. if just minimum SVE known then allow SVE TBL with just one operand transform.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
25941–25945 | I think |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
25941–25945 | I think we allowing this lowering to only one operand supported with SVE1 TBL. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
25910–25911 | Comment seems unaddressed. | |
25933 | You can create a new ShuffleMask variable, which is writable. | |
25938 | I think you can make this into an assert that NumElts <= IndexLen, because at the point of lowering the operation, the types will have been legalised in such a way that the number of lanes of the mask >= number of lanes of the source operand. | |
25939 | There are two things not right about this:
| |
25951 | What is the meaning of 'Op1IsUnused' after swapping Op1 and Op2? Should that be Op2IsUnused now? To me it seems better to update the ShuffleMask directly and remove the need for Op*IsUnused altogether | |
25956 | It's normally better to bail out early, e.g. if (condition) { // lots of code } return SDValue(); -> if (!condition) return SDValue // lots of code | |
25960 | This condition and the comment above make little sense to me. The issue to solve is instead about the maximum *index* value in the shuffle mask. If the maximum value being indexed cannot be represented in an i8, then we cannot use TBL2 to lower this shuffle vector. For example: define <16 x i8> @foo(ptr %ptr0, ptr %ptr1) { %op0 = load <256 x i8>, ptr %ptr0 %op1 = load <256 x i8>, ptr %ptr1 ; Cannot use TBL2 instruction because any offset > 255 cannot be represented in an i8! %res = shufflevector <256 x i8> %op0, <256 x i8> %op1, <16 x i32> <i32 0,i32 1,i32 2,i32 3,i32 4, i32 5, i32 6, i32 7, i32 256,i32 257, i32 258, i32 259, i32 260, i32 261, i32 262, i32 263> ret <16 x i8> %res } However, the code in the patch seems to accept this without an issue and generates wrong code with the wrong indices. .LCPI0_0: .byte 0 // 0x0 .byte 1 // 0x1 .byte 2 // 0x2 .byte 3 // 0x3 .byte 4 // 0x4 .byte 5 // 0x5 .byte 6 // 0x6 .byte 7 // 0x7 .byte 16 // 0x10 .byte 33 // 0x21 .byte 34 // 0x22 .byte 35 // 0x23 .byte 36 // 0x24 .byte 37 // 0x25 .byte 38 // 0x26 .byte 39 // 0x27 .byte 255 // 0xff .byte 255 // 0xff .byte 255 // 0xff .byte 255 // 0xff .byte 255 // 0xff .byte 255 // 0xff .byte 255 // 0xff .byte 255 // 0xff .byte 255 // 0xff .byte 255 // 0xff .byte 255 // 0xff .byte 255 // 0xff .byte 255 // 0xff .byte 255 // 0xff .byte 255 // 0xff .byte 255 // 0xff .text .globl foo .p2align 2 .type foo,@function foo: // @foo .cfi_startproc // %bb.0: adrp x8, .LCPI0_0 add x8, x8, :lo12:.LCPI0_0 ptrue p0.b ldr q1, [x1] ldr q0, [x0] ld1b { z2.b }, p0/z, [x8] tbl z0.b, { z0.b, z1.b }, z2.b // kill: def $q0 killed $q0 killed $z0 ret | |
25961 | Please check this condition earlier up for an early bail out! (at same place where it checks MinSVESize) | |
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-vector-shuffle.ll | ||
601 | Why doesn't this function use the TBL instruction? I don't see anything that stops it from being to use TBL? |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
25960 | I think this is correct, here is full DAG dump: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t18: v16i8,ch = load<(load (s128) from %ir.ptr0, align 256)> t0, t2, undef:i64 t4: i64,ch = CopyFromReg t0, Register:i64 %1 t16: v16i8,ch = load<(load (s128) from %ir.ptr1, align 256)> t0, t4, undef:i64 t11: v16i8 = vector_shuffle<0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23> t18, t16 t13: ch,glue = CopyToReg t0, Register:v16i8 $q0, t11 t22: nxv16i8 = insert_subvector undef:nxv16i8, t18, Constant:i64<0> t23: nxv16i8 = insert_subvector undef:nxv16i8, t16, Constant:i64<0> t14: ch = AArch64ISD::RET_GLUE t13, Register:v16i8 $q0, t13:1 | |
25961 | No, Most of those checks related to support TBL2 both opperand used and at the place where MinSVESize check we don't know content of the mask, thus could not tell that both operand would be used or not. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
25671–25672 | nit: You don't need to pass MinSVESize and MaxSVESize, you can easily get this again from the Subtarget. | |
25722 | This swap code seems to be unnecessary, because SelectionDAGBuilder already does this for you. You can see that if you compile: define <16 x i8> @foo(ptr %ptr0, ptr %ptr1) vscale_range(16, 16) { %op0 = load <16 x i8>, ptr %ptr0 %op1 = load <16 x i8>, ptr %ptr1 ; Takes only elements from second vector %res = shufflevector <16 x i8> %op0, <16 x i8> %op1, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 31, i32 31> ret <16 x i8> %res } Which results in: t0: ch,glue = EntryToken t4: i64,ch = CopyFromReg t0, Register:i64 %1 t8: v16i8,ch = load<(load (s128) from %ir.ptr1)> t0, t4, undef:i64 t10: v16i8 = vector_shuffle<0,1,2,3,4,5,6,7,8,9,10,11,12,13,15,15> t8, undef:v16i8 t12: ch,glue = CopyToReg t0, Register:v16i8 $q0, t10 t13: ch = AArch64ISD::RET_GLUE t12, Register:v16i8 $q0, t12:1 That means you don't need to modify the Shufflemask at all, and you can call ShuffleVectorInst:: isSingleSourceMask(ShuffleMask) to calculate IsSingleOp. If isSingleSourceMask returns true, then you can assume it will select elements from the first source operand. That will simplify this patch quite a bit! | |
25960 | It seems SelectionDAGBuilder has extracted a subregister from the two <256 x i8> vectors, hence why the offsets are valid. This clearly wasn't the right example. The following case can't be transformed like that: define <16 x i8> @foo(ptr %ptr0, ptr %ptr1) vscale_range(16, 16) { %op0 = load <256 x i8>, ptr %ptr0 %op1 = load <256 x i8>, ptr %ptr1 ; Cannot use TBL2 instruction because any offset > 255 cannot be represented in an i8! %res = shufflevector <256 x i8> %op0, <256 x i8> %op1, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 252, i32 253, i32 254, i32 255, i32 256, i32 257, i32 258, i32 259, i32 508, i32 509, i32 510, i32 511> ret <16 x i8> %res } It's only because SelectionDAG inserts the mask into a <256 x i8> vector that your code happens to discard this case (because that will fail the check MaxOffset <= NumElts * 2). I think I'd still prefer an explicit check that the maximum index value in the ShuffleMask doesn't exceed the maximum value of the element type. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
25692–25705 | If you define MaxOffset as (1 << BitsPerElt) - 1, you can remove the switch statement? | |
25707–25712 | Please move this code up after if (!MinSVESize) return SDValue(); | |
25716 | You're using unsigned here, but ShuffleMask is actually a vector of ints. undef elements are represented as index (int) -1. When you're comparing this with MaxOffset for example, things get a little confusing and I think this is hiding some bugs. Can you instead make all the ShuffleMask values use signed integers? |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
25686 | Please rename this to something more descriptive, e.g. ElementsPerVector | |
25689 | Seems more generic to write: uint64_t MaxOffset = APInt(BitsPerElt, -1, false).getZExtValue(); | |
25695 | What is the point of this variable? In my previous comment I suggested that the calculation should be done on signed values because the ShuffleMask has signed values . Doing: for (int Val : ShuffleMask) { unsigned Offset = Val; // rest of code using Offset instead of Val } does not solve that problem. | |
25696 | I don't think you need to distinguish IsSingleOp here, because if IsSingleOp == true, then Offset > NumElts, so the code below already covers it. | |
25706–25707 | You'll need to update the real offset before checking whether Offset >= MaxOffset. For example, for vscale_range(16, 16) the index into the next vector would always be at least 256, which cannot be represented in an i8. Moving this code above the previous condition fixes that. | |
25715 | nit: you can simplify this to -1 | |
25731–25732 | Combine this into an else if |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
25686 |
Perhaps ElementsPerVectorReg is more appropriate (given that there is already a NumElts variable). |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
25695 | hmm, not expecting anytging harmful here, according to the specification : "Non-negative elements in the mask represent an index." |
Addressed comments. Also, I found error in my last change where I was comparing "Offset > ElementsPerVectorReg", but It whould be ">=", Fixed.
Thanks for the latest round of changes. I have a few more comments, but this seems to be moving in the right direction.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
25695 |
That doesn't mean the value cannot be negative. I prefer the code to be explicit about what happens when the value is negative, rather than this being implicit by using an unsigned variable. ShuffleMask is an array of signed integers. When you then iterate over this array of signed integers, it makes sense to make that variable signed as well. | |
25721 | nit: you can sink the convertFromScalableVector out of the if/else conditions, and moving it to the return. | |
llvm/test/CodeGen/AArch64/sve-fixed-length-vector-shuffle.ll | ||
963 | Missing CHECK lines for this symbol. | |
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-vector-shuffle.ll | ||
378 | You have added these tests to sve-streaming-mode-fixed-length-vector-shuffle.ll, but the sve-streaming-mode-* tests should not contain any tests with specific vscale_range(..) attributes, because the point is more to test that no NEON is being used and streaming(-compatible) SVE is used instead. | |
668 | What is unacceptable about the form? |
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-vector-shuffle.ll | ||
---|---|---|
668 | We could see in the result that three tbl instructions were used instead of one, because the result vector is not reprsentable on the hardware. |
Just a few minor nits, otherwise looks good to me.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
25694 | nit: This should be called Index instead of Offset. | |
25702 | If you pull out this condition to the top of the loop and return early for negative offsets, then in the rest of the loop you can assume Offset is unsigned and remove some of the (int) casts you have in there. | |
25707–25708 | nit: this statement is a bit unclear to me. Why is it better? Do we want to say "Choosing an out-of-range index leads to the lane being zeroed". | |
25720 | nit: unnecessary curly braces. | |
25728 | Just to be safe, could you add an else branch here with llvm_unreachable("Cannot lower shuffle without SVE2 TBL"); ? | |
25910–25911 | nit: odd line-break. |
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-vector-shuffle.ll | ||
---|---|---|
5 | Drive by comment: can we add a RUN for 256 bits? It might be interesting to see the codegen for that too because there is a 256 bit SVE implementation and we have seen some inefficiencies for these type of things. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
25695 | When one of the indices is poison, it will encode that in the ShuffleMask with -1. Bailing out may be too conservative, you could for example choose to select the first element instead. With your patch at the moment: define <4 x float> @foo(ptr %a, ptr %b) nounwind vscale_range(1, 1){ %op1 = load <4 x float>, ptr %a %op2 = load <4 x float>, ptr %b %1 = shufflevector <4 x float> %op1, <4 x float> %op2, <4 x i32> <i32 0, i32 3, i32 5, i32 poison> ret <4 x float> %1 } results in: foo: // @foo ldr q0, [x1] ldr q1, [x0] mov z0.s, z0.s[1] mov z2.s, z1.s[3] str s1, [sp, #-16]! stp s2, s0, [sp, #4] ldr q0, [sp], #16 ret whereas: define <4 x float> @foo(ptr %a, ptr %b) nounwind vscale_range(1, 1){ %op1 = load <4 x float>, ptr %a %op2 = load <4 x float>, ptr %b %1 = shufflevector <4 x float> %op1, <4 x float> %op2, <4 x i32> <i32 0, i32 3, i32 5, i32 1> ret <4 x float> %1 } results in: foo: // @foo adrp x8, .LCPI0_0 ldr q0, [x0] ldr q1, [x1] ldr q2, [x8, :lo12:.LCPI0_0] tbl z0.s, { z0.s, z1.s }, z2.s ret It would also be good to have a test for this case (one of the shufflemask lanes being poison) | |
25710 | nit: s/world perfrom/would perform/ ? | |
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-vector-shuffle.ll | ||
4 | This line is exactly the same as on line 2? | |
8 | I would actually prefer to keep these tests kind of the same as they were, and for any new tests to be moved to a new test file, e.g. sve-fixed-length-vector-shuffle-tbl.ll. Where:
I'd recommend using the same size for the vectors in tests (now your test is using <4 x i8>, <8 x i8> and <16 x i8>, but I don't think that makes any difference based on the code you've added. Maybe you can just stick with <16 x i8> instead. If you go for 128-bit vectors, you'll need to add the -force-streaming-compatible flag to force the use of SVE instructions over NEON. |
llvm/test/CodeGen/AArch64/sve-fixed-length-vector-shuffle-tbl.ll | ||
---|---|---|
2 ↗ | (On Diff #557527) | Can you remove this RUN line? This RUN line is identical to the last RUN line. |
3 ↗ | (On Diff #557527) | Can you stick with one vector width (e.g. min==max==128)? I don't think the line below (min==max==256) is testing anything that the current RUN line isn't already testing when -force-stremaing-compatible is enabled. |
110 ↗ | (On Diff #557527) | Please rename this to something like shuffle_index_indices_from_op1 to make it clear the mask takes all the lanes from %op1. |
211 ↗ | (On Diff #557527) | Please rename this to something like shuffle_index_indices_from_op2 to make it clear the mask takes all the lanes from %op2. |
345 ↗ | (On Diff #557527) | Please rename this to something like shuffle_index_indices_from_both_ops to make it clear the mask takes all the lanes from both %op1 and %op2. |
349 ↗ | (On Diff #557527) | Please remove the acceptable and _unacceptable from the names. If something cannot use the TBL instruction, then prepend the name with negative_. |
380 ↗ | (On Diff #557527) | As I said here, one of the indices being poison should not be a reason not to use vector instructions. This case can be handled by the algorithm. |
384 ↗ | (On Diff #557527) | Please inline "target-features"="+sve2" vscale_range(16,16) here rather than using #0, as that makes the test a bit easier to read. |
384 ↗ | (On Diff #557527) | Can you add a comment that clarifies why this function doesn't use TBL? |
446 ↗ | (On Diff #557527) | This test is named _unacceptable_form, yet it uses TBL without issues. So this is not an "unacceptable form"? :) |
702 ↗ | (On Diff #557527) | How is this test different from shuffle_index_size_acceptable_op_both, other than it having a slightly different mask? |
833 ↗ | (On Diff #557527) | How is this test different from shuffle_index_size_acceptable_op_both, other than it having a slightly different mask and a wider return type? |
889 ↗ | (On Diff #557527) | I don't think this is a good test, because this test is relying on a DAGCombine that swaps %op1 and %op2. I think you can just remove it. |
llvm/test/CodeGen/AArch64/sve-fixed-length-vector-shuffle-tbl.ll | ||
---|---|---|
118 ↗ | (On Diff #557533) | This test is missing check lines for the shuffle mask, could you please add it? |
177 ↗ | (On Diff #557533) | I'm a bit puzzled, why did you add this test? |
201 ↗ | (On Diff #557533) | nit: I meant prepend do the start of the name, e.g. @negative_test_shuffle_index_size_op_both_maxhw |
llvm/test/CodeGen/AArch64/sve-fixed-length-vector-shuffle-tbl.ll | ||
---|---|---|
118 ↗ | (On Diff #557533) | No, for the poison value it generates different value for different conditions (hardware vector size, etc), but for this test always in range of offset for operand 1, I am not sure it is consistant. |
2 ↗ | (On Diff #557527) | No, it is not here we targeting sve, not sve2. |
llvm/test/CodeGen/AArch64/sve-fixed-length-vector-shuffle-tbl.ll | ||
---|---|---|
118 ↗ | (On Diff #557533) | I don't think there's a benefit to choosing one index over another, so what if you change the code to always choose lane 0 if the index < 0? That way, you can add the CHECK lines. |
LGTM, thanks!
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
25696 | nit: I don't know if this can ever happen, but maybe it's more safe to write if (Index < 0) if any other values can be used to represent poison. |
nit: You don't need to pass MinSVESize and MaxSVESize, you can easily get this again from the Subtarget.