As combining DUP, try to reuse larger DUPLANELANE.
Details
Diff Detail
Event Timeline
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
22119–22122 | Should this be excluded for everything that isn't DUP? | |
llvm/lib/Target/AArch64/AArch64InstrInfo.td | ||
6950 ↗ | (On Diff #541489) | Would this be needed for all the "AdvSIMD indexed element" operations? Is there some way to make that fairly generic? Maybe a PatFrag that matches either v4i16 (AArch64duplane16(.. or extract_subvector (AArch64duplane16 (.., that can be used in the instruction patterns like those in SIMDVectorIndexedHSTied. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
22119–22122 | I do not know how the performPostLD1Combine function works in detail but the function checks whether the input node is ISD::LOAD. If it is not ISD::LOAD, the function returns SDValue(). | |
llvm/lib/Target/AArch64/AArch64InstrInfo.td | ||
6950 ↗ | (On Diff #541489) | I was not just sure the patterns are needed for all SIMD indexed operations. |
llvm/lib/Target/AArch64/AArch64InstrInfo.td | ||
---|---|---|
6950 ↗ | (On Diff #541489) | Additionally, when I added def v2i32_indexed_low in SIMDVectorIndexedLongSD, I got Decoding conflict as below because it causes same encoding so I added Pats. Decoding Conflict: 0000111110......1010.0.......... 0000111110......1010............ 0000111110...................... ...0111110...................... ...011.......................... ................................ SMULLv2i32_indexed 0000111110______1010_0__________ SMULLv2i32_indexed_low 0000111110______1010_0__________ Decoding Conflict: 0010111110......1010.0.......... 0010111110......1010............ 0010111110...................... ...0111110...................... ...011.......................... ................................ UMULLv2i32_indexed 0010111110______1010_0__________ UMULLv2i32_indexed_low 0010111110______1010_0__________ |
Could we add a PatFrags that looks something like this:
def dup_v8i16 : PatFrags<(ops node:$LHS, node:$RHS), [(v4i16 (extract_subvector (v8i16 (AArch64duplane16 (v8i16 node:$LHS), node:$RHS)), (i64 0))), (v4i16 (AArch64duplane16 (v8i16 node:$LHS), node:$RHS))]>;
So that it matches either a v4i16 DUPLANE, or a subvector_extract of a v8i16 DUPLANE. Maybe call it something like AArch64duplanev416, I'm not sure. It could then be used in the existing set patterns for all the lane instructions. I think many of them might hit the same problems we see in the tests, but don't happen to have tests for every instruction.
[(set (v4i32 V128:$Rd), (OpNode (v4i16 V64:$Rn), (dup_v8i16 (v8i16 V128_lo:$Rm), VectorIndexH:$idx)))]> {
Yep, I agree with you. If possible, I did not want to touch existing patterns because it could cause other regressions.
Let me update the patterns with Patfrags.
Thanks. Looks pretty good. Do we need to handle the other "indexing" operations in the same way? For example something like this below, which is for fmul. I would guess that the extending operations (umull) are more likely to see the problem, but you can imagine the others in the "AdvSIMD indexed element" section of AArch64InstrInfo.td might all be better using dup_v8i16 now.
define <4 x float> @sel.v8i16(ptr %p, ptr %q, <4 x float> %a, <4 x float> %b, <2 x float> %c) { %splat = shufflevector <4 x float> %a, <4 x float> poison, <4 x i32> zeroinitializer %splat2 = shufflevector <4 x float> %a, <4 x float> poison, <2 x i32> zeroinitializer %r = fmul <4 x float> %b, %splat %r2 = fmul <2 x float> %c, %splat2 store <2 x float> %r2, ptr %p ret <4 x float> %r }
I have added dup_v8i16 and dup_v4i32 into SIMDVectorIndexedLongSD and SIMDIndexedLongSD multiclass.
The umull uses SIMDVectorIndexedLongSD like smull so it will be matched with the dup_v8i6 and dup_v4i32.
Let me check the fmul example and 64-bits AArch64duplane with VectorIndexS:$idx.
Cheers. LGTM, but can you try and add some tests for some of the other cases, maybe based on the code in the comment above. Just in case we need to change it in the future again, we will notice they are not as good. Thanks.
Should this be excluded for everything that isn't DUP?