Attempt to lower a shuffle as a permute instruction(zip/uzp/trn) for fixed length SVE.
Details
Diff Detail
Unit Tests
Time | Test | |
---|---|---|
80 ms | x64 debian > Clang.CodeGen::lto-newpm-pipeline.c |
Event Timeline
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
19316 | This looks a bit wrong because VT=Fixed Length Vector Type, but Op1 and Op2 have scalable vector types. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
19316 | No, the code here is correct. VT and ShuffleMask are used to check the type of the mask value, like zip or uzp mask. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
9686–9689 | I don't believe this logic is universally safe. The use of ZIP2 specifically relies on knowing the exact size of the register to know which indices represent the "high half" of a vector register. This is something that is only known when sve-vector-bits-min==sve-vector-bits-max. I also suspect functions like isZIPMask were written when only 64 and 128 bit legal vectors existed. I doubt the logic holds for the case when a vector is legal but not the exact size of the target vector register, as is the case with the SVE fixed length support. I have not looked into the extent at which the other shuffles are affected but I suspect each have their own complexity. | |
llvm/test/CodeGen/AArch64/sve-fix-length-permute-zip-uzp-trn.ll | ||
22 | To highlight my previous comment. Index 16 is only the first byte of the second half of a 256bit vector and thus if the target has bigger vector register the use of zip2 will not have the correct behaviour. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
9686–9689 | Thanks, I get it. It seems that there are potential problems for the fixed-length shuffle vector lowering, include the patch https://reviews.llvm.org/D105289. I will update a new revision later to fix it by adding the check: sve-vector-bits-min==sve-vector-bits-max |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
9686–9689 | I believe that patch is ok given the restrictions it imposes. It is fine using the is###Mask functions as a way to identify a named shuffle. What is unsafe is assuming is###Mask means the ### instruction can be used directly when converted to scalable vectors. So taking D105289 you can see that is uses isEXTMask to determine the type of shuffle but then the actually lowering code only handles a specific case and that case does not emit an EXT instruction as that would have fallen into the same trap. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
19319–19320 | I still don't think this is enough as you've missed my previous I doubt the logic holds for the case when a vector is legal but not the exact size of the target vector register comment. Further discussion is attached to the test zip_v32i8 below... I'm starting to wonder if it's worth breaking out lowerShuffleToZIP_UZP_TRN as I feel like once support is added for more of the combinations there's not likely to be much reuse. For example taking the zip1/zip2 case I think supporting zip1 is likely simple and just works for all legal VTs, but zip2 requires the VT to be exactly the size of the target vector (i.e. VT.getSizeInBits() == Subtarget->getMinSVEVectorSizeInBits() == Subtarget->getMaxSVEVectorSizeInBits()) or perhaps a different instruction sequence. | |
llvm/test/CodeGen/AArch64/sve-fix-length-permute-zip-uzp-trn.ll | ||
25–26 | The output for VBITS_EQ_256 and VBITS_EQ_512 is the same, which doesn't make sense considering the target vector length is different. For the VBITS_EQ_256 case index 16 is the start of the high half of the target vector and thus zip2 works. But for the VBITS_EQ_512 case index 16 is actually the start of the second quarter of the target vector and thus zip2 will do the wrong thing. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
19319–19320 | Thanks, I agree with you. Perhaps the current implementation of lowerShuffleToZIP_UZP_TRN reusing neon code is not so good, since SVE may have variable length. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
19319–19320 | Looking at the six shuffles trn1, trn2, uzp1, uzp2, zip1, zip2 I believe that when min_length != max_length != VT.getSizeInBits() only trn1, trn2 & zip1 are valid. We agree that zip2 is not valid because the top half of the input fixed length vector does not necessarily map to the top half of a scalable vector. The reason I believe both uzp variants are also invalid is because their underlying operation is to concat both input vectors and then extract every second element. However, the vectors we're concatenating may contain junk so you end up with: expected uzp1 [A | B | C | D], [E | F | G | H] => [A | C | E | G] SVE_VLS uzp [A | B | C | D | ? | ? | ? | ?], [E | F | G | H | ? | ? | ?] => [A | C | ? | ? | E | G | ? | ?] And thus when you extract the fixed length result you'll end up with [A | C | ? | ?] This I think further reduces the value of having lowerShuffleToZIP_UZP_TRN as it would be clearer to just handle each of the scenarios separately. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
19319–19320 | @paulwalker-arm Thanks for your detailed explanation. I fully agree with your point of view. Indeed, we don’t need lowerShuffleToZIP_UZP_TRN anymore, handle each case separately will make the code clearer. I will try to refactor the code later. |
Just a passing review I'm afraid. I'll take a proper look tomorrow.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
19319 | I think you can use ContainerVT here? It might help with the formatting. Same goes for the other places where you use Op1.getValueType(). | |
19331–19340 | I've not thought about it deeply so might be wrong but given these cases only use Op1 I'm wondering if they're always safe and thus don't need to be part of the MinSVESize == MaxSVESize == VT.getSizeInBits() restricted set? |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
19338–19341 | I figured this comment could be better and started to write something but ended up with Functions like isZIPMask return true when a ISD::VECTOR_SHUFFLE's mask represents the same logical operation as performed by a ZIP instruction. In isolation these functions do not mean the ISD::VECTOR_SHUFFLE is exactly equivalent to an AArch64 instruction. There's the extra component of ISD::VECTOR_SHUFFLE's value type to consider. Prior to SVE these functions only operated on 64/128bit vector types that have a direct mapping to a target register and so an exact mapping is implied. However, when using SVE for fixed length vectors, most legal vector types are actually sub-vectors of a larger SVE register. When mapping ISD::VECTOR_SHUFFLE to an SVE instruction care must be taken to consider how the mask's indices translate. Specifically, when the mapping requires an exact meaning for a specific vector index (e.g. Index X is the last vector element in the register) then such mappings are often only safe when the exact SVE register size is know. The main exception to this is when indices are logically relative to the first element of either ISD::VECTOR_SHUFFLE operand because these relative indices don't change when converting from fixed-length to scalable vector types (i.e. the start of a fixed length vector is always the start of a scalable vector). Which is more like a novel than a comment :) I've posted it anyway just in case there's something in there that's useful. | |
llvm/test/CodeGen/AArch64/sve-fixed-length-permute-zip-uzp-trn.ll | ||
10–14 ↗ | (On Diff #394577) | It looks like InterleavedAccessPass is causing your code to be bypassed. I couldn't immediately see a way to disable the pass other than using -O0 which might cause other issues but I think if you use volatile loads and stores within these tests you'll get what you need. |
413 ↗ | (On Diff #394577) | Please place the attributes together at the end of the file because otherwise they're hard to find when trying to see what attributes exist for a specific function. |
627 ↗ | (On Diff #394577) | As above. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
19319 | Changed, using ContainerVT to replace Op1.getValueType() | |
19331–19340 | I don't think it's safe. For undef case, Op1 will be used as input twice, it still has the problem like below: uzp [A | B | C | D | ? | ? | ? | ?], [A | B | C | D | ? | ? | ?] => [A | C | ? | ? |A | C | ? | ?] | |
19338–19341 | Thanks for your comment, I used it directly in the code | |
llvm/test/CodeGen/AArch64/sve-fixed-length-permute-zip-uzp-trn.ll | ||
10–14 ↗ | (On Diff #394577) | Thanks,volatile can work |
413 ↗ | (On Diff #394577) | done |
llvm/test/CodeGen/AArch64/sve-fixed-length-permute-zip-uzp-trn.ll | ||
---|---|---|
419 ↗ | (On Diff #395107) | I think there's value in adding a comment that states why we can safely emit zip2 instructions. Not too detailed, it's just worth drawing the readers attention to the fact that for this and related tests the runtime vector length is known. |
481 ↗ | (On Diff #395107) | As above, there's value in adding a comment that mentions why we can safely emit uzp instructions. |
627 ↗ | (On Diff #395107) | Please add a small comment to highlight this is a negative test and why only zip1 instructions are emitted. |
I don't believe this logic is universally safe. The use of ZIP2 specifically relies on knowing the exact size of the register to know which indices represent the "high half" of a vector register. This is something that is only known when sve-vector-bits-min==sve-vector-bits-max.
I also suspect functions like isZIPMask were written when only 64 and 128 bit legal vectors existed. I doubt the logic holds for the case when a vector is legal but not the exact size of the target vector register, as is the case with the SVE fixed length support.
I have not looked into the extent at which the other shuffles are affected but I suspect each have their own complexity.