If both a v2i32 DUP(x) and a v4i32 DUP(x) node exists, we can re-use the larger node using a vector extract to obtain the smaller. This comes up in the smull/smlal code, but needs a small fixup to allow the smull2 code in tryExtendDUPToExtractHigh/performAddSubLongCombine to still match smull2 extracts.
Details
Diff Detail
Unit Tests
Event Timeline
| llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
|---|---|---|
| 18155 | In tryExtendDUPToExtractHigh above, the EXTRACT_SUBVECTOR gets NumElems as the constant, but this one gets 0. Why is it 0 in this case? Seems like an odd number to give EXTRACT_SUBVECTOR. | |
| llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
|---|---|---|
| 18155 | It means "extract the bottom lanes". As in - we start from lane 0. All the lanes are equal in a dup, but it is the bottom ones that are free to extract from. The NumElems in tryExtendDUPToExtractHigh is extracting the high half, because NumElems is the number of lanes in the 64bit vector. So it extracts the high 64bits from a 128bit vector and can produce a smull2 instruction as a result. | |
In tryExtendDUPToExtractHigh above, the EXTRACT_SUBVECTOR gets NumElems as the constant, but this one gets 0. Why is it 0 in this case? Seems like an odd number to give EXTRACT_SUBVECTOR.