This is an archive of the discontinued LLVM Phabricator instance.

[DAGComb] Do not turn insert_elt into shuffle for single elt vectors.
ClosedPublic

Authored by fhahn on May 28 2020, 4:05 AM.

Details

Summary

Currently combineInsertEltToShuffle turns insert_vector_elt into a
vector_shuffle, even if the inserted element is a vector with a single
element. In this case, it should be unlikely that the additional shuffle
would be more efficient than a insert_vector_elt.

Additionally, this fixes a infinite cycle in DAGCombine, where
combineInsertEltToShuffle turns a insert_vector_elt into a shuffle,
which gets turned back into a insert_vector_elt/extract_vector_elt by
a custom AArch64 lowering (in visitVECTOR_SHUFFLE).

Such insert_vector_elt and extract_vector_elt combinations can be
lowered efficiently using mov on AArch64.

There are 2 test changes in arm64-neon-copy.ll: we now use one or two
mov instructions instead of a single zip1. The reason that we need a
second mov in ins1f2 is that we have to move the result to the result
register and is not really related to the DAGCombine fold I think.
But in any case, on most uarchs, mov should be cheaper than zip1. On a
Cortex-A75 for example, zip1 is twice as expensive as mov
(https://developer.arm.com/docs/101398/latest/arm-cortex-a75-software-optimization-guide-v20)

Diff Detail

Event Timeline

fhahn created this revision.May 28 2020, 4:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 28 2020, 4:05 AM
RKSimon accepted this revision.May 28 2020, 4:27 AM

LGTM

This revision is now accepted and ready to land.May 28 2020, 4:27 AM
This revision was automatically updated to reflect the committed changes.