This is an archive of the discontinued LLVM Phabricator instance.

[X86] Turn selects with constant condition into vector shuffles during DAG combine
ClosedPublic

Authored by craig.topper on Feb 15 2018, 6:45 PM.

Details

Summary

Currently we convert to shuffles during lowering. This moves it to DAG combine so hopefully we can get it done before type legalization has to extend the condition.

I believe in some cases we're creating SHRUNKBLENDs that end up with constant conditions because we see the extended on the condition and think its a dynamic selelect before DAG combine gets a chance to constant fold the extend. We could add combines to turn SHRUNKBLENDs with constant condition back to vselect. But it seemed like it might be better to just send them to shuffles as early as possible so they never get a chance to become SHRUNKBLENDs. This the reason some tests went from blends controlled by a constant pool load to just move.

Some of the constant pool entries changed because the sign_extend introduced by type legalization turned undef elements in select condition into 0s. While the select->shuffle used -1 in the shuffle mask. So now the shuffle lowering can do what it wants with them.

I'll remove the lowering code as a follow up. We might be able to simplify some of the pre-checks for SHRUNKBLEND as the FIXME there says.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Feb 15 2018, 6:45 PM

If we can get rid of SHRUNKBLEND, that would be great.

But we need to be careful about undef here. That caused trouble when we added the equivalent canonicalization in InstCombine. See:
https://bugs.llvm.org/show_bug.cgi?id=32486 / D31980

We ended up with this logic in IR, so I think we want the same in the DAG:

// Undef in a select condition (choose one of the operands) does not mean
// the same thing as undef in a shuffle mask (any value is acceptable), so
// give up.

I'm not trying to get rid of SHRUNKBLEND. I'm trying to fix the code to do what it thinks it is doing. It currently checks that the condition is only used by other VSELECTS, but it does this after the call to SimplifyDemandedBits. SimplifyDemandedBits has a one use check inside it that forces the demanded mask to all 1s. So if there were other users(selects or not) we didn't simplify anything that requires a SHRUNKBLEND to be created. So I tried to pre-check for the VSELECT usages and pass in the AssumeSingleUse flag to disable the force to 1s. Then I started seeing poor codegen because we weren't able to constant fold the SHRUNKBLEND to a shuffle.

For the shuffle mask problem, can I just force undef to 0? I believe that's what the sign extend from type promotion was doing already. Though in the AVX512 case we're probably doing the wrong thing today because won't type promote and convert to shuffle during lowering the same way I did in this patch.

For the shuffle mask problem, can I just force undef to 0?

If you mean always select the undef lane from a specific operand, then yes, that's a valid option. So:
select <0, undef, 0, 1> %x, %y
becomes:
select <0, 0, 0, 1> %x, %y
becomes:
shuffle %x, %y, <4, 5, 6, 3>

We could do better than that though, and this was mentioned in the IR bug. If we know that the select can be eliminated entirely by choosing from one operand or the other, then that's always a win. We get this half-right in IR - which looks like a shortcoming in InstCombiner::SimplifyDemandedVectorElts():

define <3 x i8> @sel_x_better(<3 x i8> %x, <3 x i8> %y) {
  %s = select <3 x i1><i1 1, i1 undef, i1 1>, <3 x i8> %x, <3 x i8> %y
  ret <3 x i8> %s
}

define <3 x i8> @sel_y_better(<3 x i8> %x, <3 x i8> %y) {
  %s = select <3 x i1><i1 0, i1 undef, i1 0>, <3 x i8> %x, <3 x i8> %y
  ret <3 x i8> %s
}
$ ./opt -instcombine sel.ll -S

define <3 x i8> @sel_x_better(<3 x i8> %x, <3 x i8> %y) {
  ret <3 x i8> %x
}

define <3 x i8> @sel_y_better(<3 x i8> %x, <3 x i8> %y) {
  %s = select <3 x i1> <i1 false, i1 undef, i1 false>, <3 x i8> %x, <3 x i8> %y
  ret <3 x i8> %s
}

We could do better than that though, and this was mentioned in the IR bug. If we know that the select can be eliminated entirely by choosing from one operand or the other, then that's always a win. We get this half-right in IR - which looks like a shortcoming in InstCombiner::SimplifyDemandedVectorElts():

This looks fine in DAGCombiner via the undef support in:

// Fold (vselect (build_vector all_ones), N1, N2) -> N1
if (ISD::isBuildVectorAllOnes(N0.getNode()))
  return N1;
// Fold (vselect (build_vector all_zeros), N1, N2) -> N2
if (ISD::isBuildVectorAllZeros(N0.getNode()))
  return N2;

...so we shouldn't regress those cases at least.

Treat undef as selecting the RHS. This removed the change from vselect.ll

Some of the other changes look to be using an inverted mask with swapped operands.

Forgot to mention, we already have a target independt DAG combine for all zeros or all ones select condition that I believe considers undefs.

spatel accepted this revision.Feb 16 2018, 3:35 PM

LGTM.

lib/Target/X86/X86ISelLowering.cpp
31553 ↗(On Diff #134715)

This could use a comment like:
Arbitrarily choose from the 2nd operand if the select condition element is undef.
TODO: Can we do better by matching patterns such as even/odd?

This revision is now accepted and ready to land.Feb 16 2018, 3:35 PM
This revision was automatically updated to reflect the committed changes.