This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] improve demanded element analysis for vector insert-of-extract
ClosedPublic

Authored by spatel on Aug 24 2020, 7:22 AM.

Details

Summary

InstCombine currently has odd rules for folding insert-extract chains to shuffles, so we miss collapsing seemingly simple cases as shown in the tests here.

But poison makes this not quite as easy as we might have guessed. Alive2 tests to show the subtle difference (similar to the regression tests):
https://alive2.llvm.org/ce/z/hp4hv3 (this is ok)
https://alive2.llvm.org/ce/z/ehEWaN (poison leakage)

SLP tends to create these patterns (as shown in the SLP tests), and this could help with solving PR16739.

Diff Detail

Event Timeline

spatel created this revision.Aug 24 2020, 7:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2020, 7:22 AM
spatel requested review of this revision.Aug 24 2020, 7:22 AM
lebedev.ri retitled this revision from [InstCombine] improve demanded element analysis for insert-of-extract to [InstCombine] improve demanded element analysis for vector insert-of-extract.Aug 24 2020, 9:44 AM
lebedev.ri added inline comments.Aug 24 2020, 9:51 AM
llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
1168

To check my understanding, @ins_of_ext_wrong_demand is fine
because for it PreInsertDemandedElts is not 0, right?

aqjune added a comment.EditedAug 24 2020, 10:27 AM

Alive2's explanation is correct, @ins_of_ext_wrong_demand cannot be folded, as written in the unit test.

llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
1168

I think so - IIUC, each insertelement instruction zero-izes IdxNo bit of PreInsertDemandedElts, and in case of @ins_of_ext_wrong_demand, there aren't sufficient insertelements to make PreInsertDemandedElts fully zero.

This revision is now accepted and ready to land.Aug 24 2020, 10:42 AM
spatel added inline comments.Aug 24 2020, 10:45 AM
llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
1168

That is correct. In the negative test, element 3 is still demanded, so PreInsertDemandedElts is 0b1000 when we get here from the last insertelement.
Demanded elements/bits always requires some mental exercise to follow how it is working. If anyone has suggestions to improve the comments or code, let me know.

spatel added a subscriber: bkramer.Aug 25 2020, 8:21 AM
spatel added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
1155

There's a logic bug with this patch as committed (and subsequently reverted - thanks @bkramer).

We are recursively calling SimplifyDemandedVectorElts() here and that can change UndefElts to a value that does not correspond to the value that we are returning in the block of code added with this patch.

The fix is to try the PreInsertDemandedElts fold first (just re-order the code). The existing fold below that removes the insertelt seems to be immune to that problem, but I might move that too to be safer.