This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Don't generate atrocious shuffles for constant vectors
ClosedPublic

Authored by mkuper on Jan 21 2015, 6:25 AM.

Details

Summary

According to PR22276, generating constant vectors (and, especially, zero vectors) can sometimes result in very odd code.
This is especially true for zero vectors, where a single xor should be sufficient, but instead shuffles are generated.

This adds a DAGCombine that removes a redundant shuffle for this case.
Note that lowering the resulting BUILD_VECTOR to a constant broadcast vs. a constant pool load vs. something more efficient (e.g. a xor for the case of 0) is a target decision, and isn't made here.

Diff Detail

Event Timeline

mkuper updated this revision to Diff 18511.Jan 21 2015, 6:25 AM
mkuper retitled this revision from to [DAGCombine] Don't generate atrocious shuffles for constant vectors.
mkuper updated this object.
mkuper edited the test plan for this revision. (Show Details)
mkuper added reviewers: spatel, chandlerc.
mkuper added a subscriber: Unknown Object (MLST).

Thanks Michael - see comment.

test/CodeGen/X86/widen_shuffle-1.ll
92

This looks like its leaving a shuffle of a constant vector - shouldn't this be constant folded away?

Thanks, Simon!

test/CodeGen/X86/widen_shuffle-1.ll
92

It should, but that would require an additional DAGCombine.

What this code test up generating is two vector_shuffle nodes, one comes directly from the shufflevector and the other appears during type legalization. This patch simplifies the first one, but the other one remains.

spatel edited edge metadata.Jan 21 2015, 8:45 AM

Hi Michael -

This is almost exactly what I had in mind (and probably Simon did too as he was looking at this sort of problem as well).

But shortly after filing the bug, I realized that we also need to handle non-const splats better too. Eg, this is a simplification of what I'm seeing in loops where the induction variable is splatted in order to vectorize the loop:

define <4 x i64> @splat(i64 %x) {
  %scalar_to_vector = insertelement <4 x i64> undef, i64 %x, i32 0
  %splat = shufflevector <4 x i64> %scalar_to_vector, <4 x i64> undef, <4 x i32> zeroinitializer
  ret <4 x i64> %splat
}

Can you loosen the splat check to handle more than just constants? The particular case above seems ok with AVX2 (vbroadcastsd is generated), but AVX1 still shows:

vmovq	%rdi, %xmm0
vunpcklpd	%xmm0, %xmm0, %xmm0 ## xmm0 = xmm0[0,0]
vinsertf128	$1, %xmm0, %ymm0, %ymm0

Hi Michael,

Most of the canonicalization rules performed by the dag combiner on shuffle nodes are the same rules implemented by method 'SelectionDAG::getVectorShuffle' in SelectionDAG.cpp.
If you add a new canonicalization rule in method getVectorShuffle, then you probably need to update method 'getVectorShuffle' as well. I guess, this would (hopefully) fix the problem with the missing constant folding reported by Simon.

I hope this helps!
Andrea

Thanks Sanjay, Andrea.

Sanjay, just removing the condition doesn't fix this example.
What do you think about committing the constant version first, and then looking at the rest?

Andrea, I'm not sure I understood.
Do you prefer the logic to live only in getVectorShuffle() or be replicated both here and there? I'm not sure the first is sufficient. In any case, I'm not sure that would solve the issue Simon reported. That case also involves a bitcast that changes the number of vector elements, and don't know if that's currently handled. I'll check.

What do you think about committing the constant version first, and then looking at the rest?

Sure, I have no problem with that. I'll file a separate bug so we don't lose it.

Andrea, I'm not sure I understood.
Do you prefer the logic to live only in getVectorShuffle() or be replicated both here and there? I'm not sure the first is sufficient.

I think you would need to fix both places.
When a new shuffle node is created, method 'getVectorShuffle' ensures that the resulting node is always in a canonical form. The reason the same canonicalization rules in 'getVectorShuffle' are performed by 'visitVECTOR_SHUFFLE' is because we want to preserve the canonical form of shuffles when optimizing the dag. For example, without those rules, we would break the canonical form of a shuffle if Undef gets propagated to its first operand. So, my opinion is that you should replicate that rule in both places. It's not ideal, I agree. Ideally if possible, those rules should be factored out into a common function at some point but with the current implementation we need to update both.

In any case, I'm not sure that would solve the issue Simon reported. That case also involves a bitcast that changes the number of vector elements, and don't know if that's currently handled. I'll check.

Right. getVectorShuffle would look through any bitcast, however it would not try to simplify the shuffle if the bitcast changes the number of elements. So it would not fix the problem with that test.

mkuper updated this revision to Diff 18594.Jan 22 2015, 1:21 AM
mkuper edited reviewers, added: andreadb, RKSimon; removed: chandlerc.
mkuper removed subscribers: andreadb, RKSimon.

Updated getVectorShuffle()

Also fixed a potential bug with respect to bitcasts. I couldn't actually hit this for constants because bitcasts of constants get resolved earlier, but this is safer, and, in any case, is required if the constant constraint is removed.

andreadb accepted this revision.Jan 22 2015, 3:21 AM
andreadb edited edge metadata.

Hi Michael,

the patch LGTM. Thanks!

This revision is now accepted and ready to land.Jan 22 2015, 3:21 AM
This revision was automatically updated to reflect the committed changes.