This is an archive of the discontinued LLVM Phabricator instance.

[VectorCombine] allow vector loads with mismatched insert type
ClosedPublic

Authored by spatel on Aug 18 2020, 11:03 AM.

Details

Summary

This is an enhancement to D81766 to allow loading the minimum target vector type into an IR vector with a different number of elements.

In one of the motivating tests from PR16739, SLP creates <2 x float> load ops mixed with <4 x float> insert ops, so we want to handle that pattern in addition to potential oversized vectors created by the vectorizers.

I'm not sure if we should try to model the cost of the identity shuffle as an insert/extract subvector since we are shuffling with undef?

Diff Detail

Event Timeline

spatel created this revision.Aug 18 2020, 11:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 18 2020, 11:03 AM
spatel requested review of this revision.Aug 18 2020, 11:03 AM

I'm not sure if we should try to model the cost of the identity shuffle as an insert/extract subvector since we are shuffling with undef?

This came up on PR43605 - we have an insert subvector cost enum but we can't actually create shufflevectors with an insertion pattern - it requires 2 or more shuffles working together - so its tricky to cost it - I'm wondering if we should replace the insert cost enum with something else (concat/lengthen/whatever). We have a similar problem for extract subvector shuffles - we don't have anything that takes elements from one/both sources but is more than just a basic sequential mask. It comes down to what use cases we have here with the VectorCombiner as well as the vectorizers. In short, the shuffle cost enums don't really match what the ir shufflevector can do.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
148

Are we in danger of creating out of bounds shuffle mask indices if the dst vector type is more than 2x the original size (v2f32 -> v16f32 etc.) ? I think they canonicalize to undef but I'm not sure (+ have no access to the source tree atm)

spatel updated this revision to Diff 288108.Aug 26 2020, 1:47 PM
spatel marked an inline comment as done.

Patch updated:
Fixed shuffle mask creation to not go out-of-bounds for greater than 2x subvector size difference.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
148

Nice catch - yes, that would crash on creation. Adjusted one of the tests to verify that.

RKSimon accepted this revision.Sep 1 2020, 4:25 AM

LGTM with one minor

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
148

(style) Don't call Ty->getNumElements in every loop

Maybe pull it out above the loop as we call it in the SmallVector constuctor as well?

This revision is now accepted and ready to land.Sep 1 2020, 4:25 AM
kazu added a subscriber: kazu.Sep 17 2020, 9:50 AM

I've filed an issue for a performance regression caused by this patch:

https://bugs.llvm.org/show_bug.cgi?id=47558

I've filed an issue for a performance regression caused by this patch:

https://bugs.llvm.org/show_bug.cgi?id=47558

This should get that example back to where it was:
rG48a23bccf373