Reapply r299047, but this time handle correctly splat-masks with undef elements.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
test/CodeGen/X86/shuffle-of-splat-multiuses.ll | ||
---|---|---|
77 ↗ | (On Diff #94895) | These two moves seem like a result of suboptimal scheduling + register allocation choices. |
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
14622–14624 ↗ | (On Diff #94895) | Do you think we'll need to distinguish between the "strict" splat and splat-with-undef cases anywhere else? I'm wondering if it would be better to add a bool "HasAnyUndefs" param to ShuffleVectorSDNode::isSplat(). Even if we don't have another use case at the moment, that would at least make the API more like BuildVectorSDNode::isConstantSplat(). |
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
14622–14624 ↗ | (On Diff #94895) | Not sure about that, i can try to survey the code for such case. |
test/CodeGen/X86/shuffle-of-splat-multiuses.ll | ||
77 ↗ | (On Diff #94895) | Opened Bug 32719 to discuss the extra copy issue |
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
14635–14642 ↗ | (On Diff #95946) | I'm scared that I'm not seeing all of the undef possibilities here (since I definitely missed them the first time!). How about writing this to make the options more visible (not sure if it's more or less efficient), but I'd find this easier to read (warning: untested code): // If the splat mask has no undefs, then return it as-is. // It doesn't matter if the user mask has undefs. We choose the splat mask element because it's efficient to return an existing splat node. ArrayRef<int> SplatMask = Splat->getMask(); if (find(SplatMask, -1) == SplatMask.end()) return SDValue(Splat, 0); // The splat mask has undefs. If the user shuffle does not choose undef elements where the splat does not have undef elements, return the existing splat node. if (all_of(UserMask, [](int Idx) { return Idx == -1 || Splat->getMaskElt(Idx) == Splat->getSplatIndex(); })) return SDValue(Splat, 0); // The splat mask has undefs, and the user shuffle chooses undef elements where the splat has defined elements. Create a more liberal splat mask (one with more undefs). for (unsigned i = 0; i < NumElts; ++i) // No need for "IsSameMask" here because we already handled any case where we wanted to return the existing node. |
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
14635–14642 ↗ | (On Diff #95946) | if (all_of(UserMask, [](int Idx) { return Idx == -1 || Splat->getMaskElt(Idx) == Splat->getSplatIndex(); })) return SDValue(Splat, 0); This is not correct. For example UserMask = [0,2,u,u], SplatMask=[2,u,2,u]. It would be incorrect to return 'Splat' because it would result in an undef at element index=1 which the composition of masks does not yield. |
Restructure in spirit of Sanjay's proposal: first try to identify cases where we can return the splat-shuffle itself, if not, build a new shuffle mask and create a new shuffle node.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
14880 ↗ | (On Diff #98157) | typo: mismatxh |
14886–14892 ↗ | (On Diff #98157) | I'm a coding caveman, so I had not seen this syntax before. I know it takes a line or 2 more, but I think it's easier to read when you give the lambda a name, so something like: auto canSimplifyToExistingSplat = [UserMask, SplatMask]() { ... } if (canSimplifyToExistingSplat()) return SDValue(Splat, 0); |
14887–14890 ↗ | (On Diff #98157) | Given that I got this wrong twice, I'd like to see some mask examples for various undef cases in the code comments. I think the tests cover these, so it should just be a matter of listing those cases here similar to the explanatory note in this review: UserMask = [0,2,u,u], SplatMask=[2,u,2,u] --> ? Bonus points for adding similar comments in the test file to describe the undef handling there too. |
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
14880 ↗ | (On Diff #98157) | Thanks |
14886–14892 ↗ | (On Diff #98157) | Ok |
14887–14890 ↗ | (On Diff #98157) | I agree this will improve the readability of the code and point-out this tricky subtlety. |
Update according to Sanjay's comments. I updated the comments with some representative examples, but gave up on annotating the tests since after Isel it's hard to see from the text which elements were undef as we choose arbitrary values for undef elements.