This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fix assert condition in `foldSelectShuffleOfSelectShuffle`
ClosedPublic

Authored by n-omer on Oct 19 2022, 7:18 AM.

Details

Summary

The assert() is making an assumption that the resulting shuffle mask will always select elements from both vectors, this is untrue in the case of two shuffles being folded if the former shuffle has a mask with undef elements in it. In such a case folding the shuffles might result in a mask which only selects from one of the vectors because the other elements (in the mask) are undef.

Diff Detail

Event Timeline

n-omer created this revision.Oct 19 2022, 7:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2022, 7:18 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
n-omer requested review of this revision.Oct 19 2022, 7:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2022, 7:18 AM
RKSimon added inline comments.Oct 19 2022, 7:24 AM
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2008

clang-format this please :)

llvm/test/Transforms/InstCombine/foldSelectShuffleOfSelectShuffle.ll
3 ↗(On Diff #468906)

Use the update_test_checks script to autogenerate the checks

7 ↗(On Diff #468906)

remove local_unnamed_addr #0

nikic added a subscriber: nikic.Oct 19 2022, 7:43 AM
nikic added inline comments.
llvm/test/Transforms/InstCombine/foldSelectShuffleOfSelectShuffle.ll
3 ↗(On Diff #468906)

...and move this test next to wherever the other tests for this transform are.

spatel added inline comments.Oct 19 2022, 7:57 AM
llvm/test/Transforms/InstCombine/foldSelectShuffleOfSelectShuffle.ll
4 ↗(On Diff #468906)

...and minimize the test :)

No loads or globals are needed to show the bug (if this is filed as a bug somewhere, add a reference).

define <4 x float> @identity_mask(<4 x float>%x, <4 x float> %y) {
  %s1 = shufflevector <4 x float> %x, <4 x float> %y, <4 x i32> <i32 0, i32 5, i32 undef, i32 undef>
  %s2 = shufflevector <4 x float> %s1, <4 x float> %x, <4 x i32> <i32 0, i32 undef, i32 6, i32 7>
  ret <4 x float> %s2
}
n-omer updated this revision to Diff 468917.Oct 19 2022, 8:07 AM

Address review comments

n-omer marked an inline comment as done.Oct 19 2022, 8:07 AM
spatel accepted this revision.Oct 19 2022, 9:10 AM

LGTM.

For reference, the bug was introduced with e239198cdbbf. Before that patch, the pattern would get reduced in multiple steps using demanded elements, so there's no difference in optimization on this kind of shuffle pattern AFAICT.

This revision is now accepted and ready to land.Oct 19 2022, 9:10 AM
RKSimon accepted this revision.Oct 20 2022, 4:54 AM

LGTM - cheers