This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Extract intermediate insertelement for external use
AbandonedPublic

Authored by anton-afanasyev on Feb 11 2022, 2:58 AM.

Details

Summary

If insertelement, extracted from vectorized bundle, is intermediate,
i.e. not the last one, then shuffle vectorized value with original
vector to get correct value for external use.

Fixes PR52275 (#51617)

Diff Detail

Event Timeline

anton-afanasyev requested review of this revision.Feb 11 2022, 2:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 11 2022, 2:58 AM
ABataev added inline comments.Feb 11 2022, 3:38 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
669

Make it a separate NFC patch

Make a separate NFC patch for default parameter of getInsertIndex

anton-afanasyev marked an inline comment as done.Feb 11 2022, 4:52 AM
anton-afanasyev added inline comments.
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
669

Done

Hmm, looks like the problem appears only if we have insertelements to be vectorized not from vectorizeInsertElementInst, where we have a proper check. Maybe just add a check for insertelements in other functions and just expect them from vectorizeInsertElementInst?

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
722

Drop llvm:

730

std::next(Mask.begin(), NumElts) -> Mask.end()

730

Would be good to fill unused elements with UndefMaskElem instead, which may help to improve cost in the future for large vectors.

734

What if InsertIdx is UndefMaskElem?

5690–5711

What if the same insertelement is used in several users? Would not we miss some cost here?

anton-afanasyev marked 4 inline comments as done.Feb 12 2022, 12:52 AM
anton-afanasyev added inline comments.
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
722

Done

730

std::next(Mask.begin(), NumElts) -> Mask.end()

Sure, thanks.

Would be good to fill unused elements with UndefMaskElem instead, which may help to improve cost in the future for large vectors.

What do you mean by unused elements? We have to copy whole vector (with several replaced elements).

734

I forgot about it, thanks. But it may be reasonable to get rid of non-constant and undef indices at all, at buildTree() stage: https://reviews.llvm.org/D119623

anton-afanasyev marked 3 inline comments as done.

Address some comments, rebase against D119623

anton-afanasyev marked an inline comment as done.Feb 12 2022, 12:58 AM

Hmm, looks like the problem appears only if we have insertelements to be vectorized not from vectorizeInsertElementInst, where we have a proper check. Maybe just add a check for insertelements in other functions and just expect them from vectorizeInsertElementInst?

Well, that would definitely solve the problem. There is also another way: on the contrary, remove hasOneUse() check from vectorizeInsertElementInst. This patch allows it.

anton-afanasyev marked an inline comment as done.Feb 12 2022, 1:38 AM
anton-afanasyev added inline comments.
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5690–5711

Why? See line 5924 above, "We only add extract cost once for the same scalar", comment states.

Hmm, looks like the problem appears only if we have insertelements to be vectorized not from vectorizeInsertElementInst, where we have a proper check. Maybe just add a check for insertelements in other functions and just expect them from vectorizeInsertElementInst?

Well, that would definitely solve the problem. There is also another way: on the contrary, remove hasOneUse() check from vectorizeInsertElementInst. This patch allows it.

Yes, sure, but it will require some extra stuff and some time to polish and fix all corner cases. If we’d like to fix it in 14.0, better to start with a simple fix and then try to implement more complex feature.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5690–5711

Currently it is not quite correct for scalars (too optimistic) but will be fixed in future, once improved reductions support is committed. But for insertelements it is not correct for sure. If one instruction is a part of several build vector sequences, you need to calculate the shuffles for each of them independently. Same applies to the final vector emission.

anton-afanasyev marked 2 inline comments as done.Feb 13 2022, 1:14 PM

Well, that would definitely solve the problem. There is also another way: on the contrary, remove hasOneUse() check from vectorizeInsertElementInst. This patch allows it.

Yes, sure, but it will require some extra stuff and some time to polish and fix all corner cases. If we’d like to fix it in 14.0, better to start with a simple fix and then try to implement more complex feature.

Sounds reasonable, thanks! Please take a look at the simple fix: D119679 .

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5690–5711

If one insertelement is a part of several build vector sequences, it is enough to calculate the only one shuffle for any of them, isn't it? Or do you mean issue with scheduling of new shuffles?

ABataev added inline comments.Feb 13 2022, 1:55 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5690–5711

I think you need to calculate the cost for each buildvector sequence, this insertelement is part of. You need to shuffle the resulting vector with n other vectors.

anton-afanasyev abandoned this revision.Mar 19 2022, 6:44 AM
anton-afanasyev marked 2 inline comments as done.

Abandoning this. PR52275 was solved by D119679 and following patches.

Herald added a project: Restricted Project. · View Herald TranscriptMar 19 2022, 6:44 AM