This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
-
DAGCombiner.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
combine-concatvectors.ll

Differential D55274

[DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef.
ClosedPublic

Authored by andreadb on Dec 4 2018, 8:12 AM.

Download Raw Diff

Details

Reviewers

RKSimon
craig.topper
spatel

Commits

rG52a2bac583f3: [DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef.
rL348522: [DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef.

Summary

This patch introduces a new DAGCombiner rule to simplify concat_vectors nodes:

concat_vectors( bitcast (scalar_to_vector %A), UNDEF) --> bitcast (scalar_to_vector %A)

This patch only partially addresses PR39257. In particular, it is enough to fix one of the two problematic cases mentioned in PR39257. However, it is not enough to fix the original test case from Craig in PR39257; that particular case would probably require a more complicated approach (and knowledge about used bits).

Before this patch, we used to generate the following code for function PR39257 (-mtriple=x86_64 , -mattr=+avx):

vmovsd  (%rdi), %xmm0           # xmm0 = mem[0],zero
vxorps  %xmm1, %xmm1, %xmm1
vblendps        $3, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0,1],xmm1[2,3]
vmovaps %ymm0, (%rsi)
vzeroupper
retq

Now we generate this:

vmovsd  (%rdi), %xmm0           # xmm0 = mem[0],zero
vmovaps %ymm0, (%rsi)
vzeroupper
retq

As a side note: that VZEROUPPER is completely redundant...

I guess the vzeroupper insertion pass doesn't realize that the definition of %xmm0 from vmovsd is already zeroing the upper half of %ymm0. Note that on -mcpu=btver2, we don't get that vzeroupper because pass vzeroupper insertion pass is disabled.

Diff Detail

Repository: rL LLVM

Event Timeline

andreadb created this revision.Dec 4 2018, 8:12 AM

Maybe call the test file combine-concatvectors.ll ? That matches the other filenames we have.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16327 ↗	(On Diff #176636)	Use peekThroughOneUseBitcasts ?
16342 ↗	(On Diff #176636)	This is all very similar to concat_vectors(scalar, undef) -> scalar_to_vector(sclr) at the beginning of visitCONCAT_VECTORS

Thanks Simon for the feedback.
My new combine logic was indeed very similar to the existing logic in visitCONCAT_VECTORS.
This patch reuses that logic and introduces a new rule for the case where the first operand of the concat_vector is a scalar_to_vector.

For readability reasons (and simplicity), i moved all of that logic from visitCONCAT_VECTORS into a separate function.

The reason why I had to tweak combineConcatVectorOfScalars() is because before this patch, visitCONCAT_VECTORS used to early exit from visitCONCAT_VECTORS if it failed to fold a 'concat_vectors of a scalar'.
With this patch, we don't bail out immediately from visitCONCAT_VECTORS. Instead, we try other combine rules. As a consequence, we potentially end up calling function combineConcatVectorOfScalars() more often.
The problem with running combineConcatVectorOfScalars() more often is that there are cases where combineConcatVectorOfScalars() returns a sub-optimal BUILD_VECTOR(Scalar, UNDEF), insted of a much simpler (and more canonical) SCALAR_TO_VECTOR(Scalar).
That particular build_vector is poorly lowered by the x86 backend if the target only has SSE but not SSE2. So, test codegen/x86/vec_fneg.ll was reporting a failure after that refactoring (using -mtriple=i686 -mattr=+sse). Fixing that particular corner case was enough to get back the "expected codegen".

RKSimon added inline comments.Dec 5 2018, 7:45 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16512 ↗	(On Diff #176813)	check for implicit truncation of the scalar value
test/CodeGen/X86/combine-concatvectors.ll
2 ↗	(On Diff #176813)	Please commit this with trunk's current codegen and rebase so we see the diff

Also, its probably worth doing the initial simplifyConcatVectors refactor as a NFC

andreadb marked 2 inline comments as done.Dec 5 2018, 7:57 AM

andreadb added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16512 ↗	(On Diff #176813)	Right. I will add that check.
test/CodeGen/X86/combine-concatvectors.ll
2 ↗	(On Diff #176813)	Will do.

Diffusion mentioned this in rL348380: [X86] Add test case to show missed opportunity to combine a concat_vector into….Dec 5 2018, 8:26 AM

Patch updated.

Initially, I wanted to move thal combine logic into a separate function.
However, the presence of an early return in the original combine logic ended up causing problems once the code was moved to a separate function.
I was originally under the impression that the change was safe. However - after running more tests - if we don't bail out immediately from the visitCONCAT_VECTOR function, we may end up introducing odd regressions due to the presence of illegal build_vectors dag nodes (when testing for an SSE1 i686 target) (since we potentially trigger other combine rules which may in turn introduce more problematic illegal vector types).

So, I opted for this simpler (and safer) approach.

LGTM - thanks

This revision is now accepted and ready to land.Dec 6 2018, 8:24 AM

Closed by commit rL348522: [DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef. (authored by adibiagio). · Explain WhyDec 6 2018, 11:58 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

16 lines

test/

CodeGen/

X86/

combine-concatvectors.ll

2 lines

Diff 177023

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 16,512 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitCONCAT_VECTORS(SDNode *N) {

// Optimize concat_vectors where all but the first of the vectors are undef.		// Optimize concat_vectors where all but the first of the vectors are undef.
if (std::all_of(std::next(N->op_begin()), N->op_end(), [](const SDValue &Op) {		if (std::all_of(std::next(N->op_begin()), N->op_end(), [](const SDValue &Op) {
return Op.isUndef();		return Op.isUndef();
})) {		})) {
SDValue In = N->getOperand(0);		SDValue In = N->getOperand(0);
assert(In.getValueType().isVector() && "Must concat vectors");		assert(In.getValueType().isVector() && "Must concat vectors");

// Transform: concat_vectors(scalar, undef) -> scalar_to_vector(sclr).		SDValue Scalar = peekThroughOneUseBitcasts(In);
if (In->getOpcode() == ISD::BITCAST &&
!In->getOperand(0).getValueType().isVector()) {
SDValue Scalar = In->getOperand(0);

		// concat_vectors(scalar_to_vector(scalar), undef) ->
		// scalar_to_vector(scalar)
		if (!LegalOperations && Scalar.getOpcode() == ISD::SCALAR_TO_VECTOR &&
		Scalar.hasOneUse()) {
		EVT SVT = Scalar.getValueType().getVectorElementType();
		if (SVT == Scalar.getOperand(0).getValueType())
		Scalar = Scalar.getOperand(0);
		}

		// concat_vectors(scalar, undef) -> scalar_to_vector(scalar)
		if (!Scalar.getValueType().isVector()) {
// If the bitcast type isn't legal, it might be a trunc of a legal type;		// If the bitcast type isn't legal, it might be a trunc of a legal type;
// look through the trunc so we can still do the transform:		// look through the trunc so we can still do the transform:
// concat_vectors(trunc(scalar), undef) -> scalar_to_vector(scalar)		// concat_vectors(trunc(scalar), undef) -> scalar_to_vector(scalar)
if (Scalar->getOpcode() == ISD::TRUNCATE &&		if (Scalar->getOpcode() == ISD::TRUNCATE &&
!TLI.isTypeLegal(Scalar.getValueType()) &&		!TLI.isTypeLegal(Scalar.getValueType()) &&
TLI.isTypeLegal(Scalar->getOperand(0).getValueType()))		TLI.isTypeLegal(Scalar->getOperand(0).getValueType()))
Scalar = Scalar->getOperand(0);		Scalar = Scalar->getOperand(0);

▲ Show 20 Lines • Show All 2,616 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/combine-concatvectors.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=+avx < %s \| FileCheck %s			; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=+avx < %s \| FileCheck %s

	define void @PR32957(<2 x float>* %in, <8 x float>* %out) {			define void @PR32957(<2 x float>* %in, <8 x float>* %out) {
	; CHECK-LABEL: PR32957:			; CHECK-LABEL: PR32957:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero			; CHECK-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
	; CHECK-NEXT: vxorps %xmm1, %xmm1, %xmm1
	; CHECK-NEXT: vblendps {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3]
	; CHECK-NEXT: vmovaps %ymm0, (%rsi)			; CHECK-NEXT: vmovaps %ymm0, (%rsi)
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%ld = load <2 x float>, <2 x float>* %in, align 8			%ld = load <2 x float>, <2 x float>* %in, align 8
	%ext = extractelement <2 x float> %ld, i64 0			%ext = extractelement <2 x float> %ld, i64 0
	%ext2 = extractelement <2 x float> %ld, i64 1			%ext2 = extractelement <2 x float> %ld, i64 1
	%ins = insertelement <8 x float> <float undef, float undef, float 0.0, float 0.0, float 0.0, float 0.0, float 0.0, float 0.0>, float %ext, i64 0			%ins = insertelement <8 x float> <float undef, float undef, float 0.0, float 0.0, float 0.0, float 0.0, float 0.0, float 0.0>, float %ext, i64 0
	%ins2 = insertelement <8 x float> %ins, float %ext2, i64 1			%ins2 = insertelement <8 x float> %ins, float %ext2, i64 1
	store <8 x float> %ins2, <8 x float>* %out, align 32			store <8 x float> %ins2, <8 x float>* %out, align 32
	ret void			ret void
	}			}