This is an archive of the discontinued LLVM Phabricator instance.

[DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef.
ClosedPublic

Authored by andreadb on Dec 4 2018, 8:12 AM.

Details

Summary

This patch introduces a new DAGCombiner rule to simplify concat_vectors nodes:

concat_vectors( bitcast (scalar_to_vector %A), UNDEF) --> bitcast (scalar_to_vector %A)

This patch only partially addresses PR39257. In particular, it is enough to fix one of the two problematic cases mentioned in PR39257. However, it is not enough to fix the original test case from Craig in PR39257; that particular case would probably require a more complicated approach (and knowledge about used bits).

Before this patch, we used to generate the following code for function PR39257 (-mtriple=x86_64 , -mattr=+avx):

vmovsd  (%rdi), %xmm0           # xmm0 = mem[0],zero
vxorps  %xmm1, %xmm1, %xmm1
vblendps        $3, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0,1],xmm1[2,3]
vmovaps %ymm0, (%rsi)
vzeroupper
retq

Now we generate this:

vmovsd  (%rdi), %xmm0           # xmm0 = mem[0],zero
vmovaps %ymm0, (%rsi)
vzeroupper
retq

As a side note: that VZEROUPPER is completely redundant...

I guess the vzeroupper insertion pass doesn't realize that the definition of %xmm0 from vmovsd is already zeroing the upper half of %ymm0. Note that on -mcpu=btver2, we don't get that vzeroupper because pass vzeroupper insertion pass is disabled.

Diff Detail

Repository
rL LLVM

Event Timeline

andreadb created this revision.Dec 4 2018, 8:12 AM

Maybe call the test file combine-concatvectors.ll ? That matches the other filenames we have.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16327 ↗(On Diff #176636)

Use peekThroughOneUseBitcasts ?

16342 ↗(On Diff #176636)

This is all very similar to concat_vectors(scalar, undef) -> scalar_to_vector(sclr) at the beginning of visitCONCAT_VECTORS

andreadb updated this revision to Diff 176813.Dec 5 2018, 7:37 AM
andreadb marked an inline comment as done.

Thanks Simon for the feedback.
My new combine logic was indeed very similar to the existing logic in visitCONCAT_VECTORS.
This patch reuses that logic and introduces a new rule for the case where the first operand of the concat_vector is a scalar_to_vector.

For readability reasons (and simplicity), i moved all of that logic from visitCONCAT_VECTORS into a separate function.

The reason why I had to tweak combineConcatVectorOfScalars() is because before this patch, visitCONCAT_VECTORS used to early exit from visitCONCAT_VECTORS if it failed to fold a 'concat_vectors of a scalar'.
With this patch, we don't bail out immediately from visitCONCAT_VECTORS. Instead, we try other combine rules. As a consequence, we potentially end up calling function combineConcatVectorOfScalars() more often.
The problem with running combineConcatVectorOfScalars() more often is that there are cases where combineConcatVectorOfScalars() returns a sub-optimal BUILD_VECTOR(Scalar, UNDEF), insted of a much simpler (and more canonical) SCALAR_TO_VECTOR(Scalar).
That particular build_vector is poorly lowered by the x86 backend if the target only has SSE but not SSE2. So, test codegen/x86/vec_fneg.ll was reporting a failure after that refactoring (using -mtriple=i686 -mattr=+sse). Fixing that particular corner case was enough to get back the "expected codegen".

RKSimon added inline comments.Dec 5 2018, 7:45 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16512 ↗(On Diff #176813)

check for implicit truncation of the scalar value

test/CodeGen/X86/combine-concatvectors.ll
2 ↗(On Diff #176813)

Please commit this with trunk's current codegen and rebase so we see the diff

Also, its probably worth doing the initial simplifyConcatVectors refactor as a NFC

andreadb marked 2 inline comments as done.Dec 5 2018, 7:57 AM
andreadb added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16512 ↗(On Diff #176813)

Right. I will add that check.

test/CodeGen/X86/combine-concatvectors.ll
2 ↗(On Diff #176813)

Will do.

andreadb updated this revision to Diff 176858.Dec 5 2018, 11:14 AM

Patch updated.

Initially, I wanted to move thal combine logic into a separate function.
However, the presence of an early return in the original combine logic ended up causing problems once the code was moved to a separate function.
I was originally under the impression that the change was safe. However - after running more tests - if we don't bail out immediately from the visitCONCAT_VECTOR function, we may end up introducing odd regressions due to the presence of illegal build_vectors dag nodes (when testing for an SSE1 i686 target) (since we potentially trigger other combine rules which may in turn introduce more problematic illegal vector types).

So, I opted for this simpler (and safer) approach.

RKSimon accepted this revision.Dec 6 2018, 8:24 AM

LGTM - thanks

This revision is now accepted and ready to land.Dec 6 2018, 8:24 AM
This revision was automatically updated to reflect the committed changes.