Page MenuHomePhabricator

[SLP]Do not emit extract elements for insertelements users, replace with shuffles directly.
Needs ReviewPublic

Authored by ABataev on Aug 12 2021, 8:26 AM.

Details

Summary

SLP vectorizer emits extracts for externally used vectorized scalars and
estimates the cost for each such extract. But in many cases these
scalars are input for insertelement instructions, forming buildvector,
and instead of extractelement/insertelement pair we can emit/cost
estimate shuffle(s) cost and generate series of shuffles, which can be
further optimized.

Tested using test-suite (+SPEC2017), the tests passed, SLP was able to
generate/vectorize more instructions in many cases and it allowed to reduce
number of re-vectorization attempts (where we could try to vectorize
buildector insertelements again and again).

Diff Detail

Event Timeline

ABataev created this revision.Aug 12 2021, 8:26 AM
ABataev requested review of this revision.Aug 12 2021, 8:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 12 2021, 8:26 AM
ABataev updated this revision to Diff 368855.Aug 26 2021, 5:24 AM

Rebased. Checked that the test SLPVectorizer/X86/remark_extract_broadcast.ll (mentioned in D108703) is updated as requested.

lebedev.ri added inline comments.
llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll
19–23

Thanks! This is clearly an improvement, but these two shuffles are still clearly redundant,
because in either case, you end up with 0'th element of LD in some elements of output.
In this case you could simply drop the first shuffle, and do the second one directly.

ABataev added inline comments.Aug 26 2021, 5:34 AM
llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll
19–23

I think, in codegen the first shuffle will be simply dropped (this is an identity shuffle). But I'll check what can be improved here.

lebedev.ri added inline comments.Aug 26 2021, 5:37 AM
llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll
19–23

Ignoring more complicated cases, perhaps the key point here
is that the TMP0 is an identity (=>single-source), non-width-changing shuffle,
so it can be naturally dropped. ShuffleVectorInst::isIdentityMask() might be relevant.

ABataev added inline comments.Aug 26 2021, 5:40 AM
llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll
19–23

Agree. Will check what can be done here to improve it.

ABataev updated this revision to Diff 368885.Aug 26 2021, 8:00 AM

Address comments

Please can you rebase?

Please can you rebase?

Sure, will do, just need to finish my work with other patches.