This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Reuse larger DUP if available
ClosedPublic

Authored by dmgreen on May 26 2022, 12:32 AM.

Details

Summary

If both a v2i32 DUP(x) and a v4i32 DUP(x) node exists, we can re-use the larger node using a vector extract to obtain the smaller. This comes up in the smull/smlal code, but needs a small fixup to allow the smull2 code in tryExtendDUPToExtractHigh/performAddSubLongCombine to still match smull2 extracts.

Diff Detail

Event Timeline

dmgreen created this revision.May 26 2022, 12:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 26 2022, 12:32 AM
dmgreen requested review of this revision.May 26 2022, 12:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 26 2022, 12:32 AM
samtebbs added inline comments.May 26 2022, 1:41 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18245

In tryExtendDUPToExtractHigh above, the EXTRACT_SUBVECTOR gets NumElems as the constant, but this one gets 0. Why is it 0 in this case? Seems like an odd number to give EXTRACT_SUBVECTOR.

dmgreen added inline comments.May 26 2022, 2:20 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18245

It means "extract the bottom lanes". As in - we start from lane 0. All the lanes are equal in a dup, but it is the bottom ones that are free to extract from.

The NumElems in tryExtendDUPToExtractHigh is extracting the high half, because NumElems is the number of lanes in the 64bit vector. So it extracts the high 64bits from a 128bit vector and can produce a smull2 instruction as a result.

samtebbs accepted this revision.May 26 2022, 2:35 AM

LGTM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18245

Great explanation. Thanks.

This revision is now accepted and ready to land.May 26 2022, 2:35 AM
This revision was landed with ongoing or failed builds.May 29 2022, 11:42 AM
This revision was automatically updated to reflect the committed changes.