This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Fix incorrect sinking of a truncate into the operand of a shift.
ClosedPublic

Authored by andreadb on Sep 1 2016, 1:25 PM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
arsenm
hfinkel

Commits

rGfd503e5af306: [DAGcombiner] Fix incorrect sinking of a truncate into the operand of a shift.
rL280482: [DAGcombiner] Fix incorrect sinking of a truncate into the operand of a shift.

Summary

This patch fixes a regression introduced by revision 268094.

Revision 268094 added the following dag combine rule:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.size / 2

That rule converts a truncate of a shift-by-constant into a shift of a truncated value. We do this only if the shift count is less than half the size in bits of the truncated value (K < vt.size / 2).

The problem is that the constraint on the shift count is incorrect. So, the rule doesn't work well in some cases involving vector types.

Example:

;;
define <8 x i16> @trunc_shift(<8 x i32> %a) {
entry:

%shl = shl <8 x i32> %a, <i32 17, i32 17, i32 17, i32 17, i32 17, i32 17, i32 17>
%conv = trunc <8 x i32> %shl to <8 x i16>
ret <8 x i16> %conv

}
;;

According to the above mentioned rule, it is valid to convert the trunc+shift-by-constant into a shift-by-constant of a truncated value.

(v8i16 (trunc (shl (v8i32 %a, <17,17,17,17,17,17,17,17>)))

-->

(v8i16 (shl  (v8i16 (trunc v8i32 %a)), <17,17,17,17,17,17,17,17>)

The problem is that the new "shl" is undefined (the shift count is bigger than the vector element size). So, the dag combiner would later on replace the shift node with an 'undef' value.

The combine rule should have been written instead like this:

// trunc (shl x, K) -> shl (trunc x), K => K < vt.getScalarSizeInBits()

Basically, if K is smaller than the "scalar size in bits" of the truncated value, then we know that by "sinking" the truncate into the operand of the shift we would never accidentally make the shift undefined.

This patch fixes the check on the shift count, and adds a test case to show that we no longer fold the entire computation to undef.

Please let me know if this is okay to commit.

Thanks,
Andrea

Diff Detail

Repository: rL LLVM

Event Timeline

andreadb updated this revision to Diff 70053.Sep 1 2016, 1:25 PM

andreadb retitled this revision from to [DAGCombiner] Fix incorrect sinking of a truncate into the operand of a shift..

andreadb updated this object.

andreadb added reviewers: arsenm, RKSimon, spatel.

andreadb added a subscriber: llvm-commits.

Herald added a subscriber: wdng. · View Herald TranscriptSep 1 2016, 1:25 PM

LGTM, but can you add a scalar case with the original test I added for this which hits the new limit of the full scalar bit width behavior instead of / 2

hfinkel accepted this revision.Sep 1 2016, 4:40 PM

hfinkel added a reviewer: hfinkel.

This revision is now accepted and ready to land.Sep 1 2016, 4:40 PM

In D24154#532078, @arsenm wrote:

LGTM, but can you add a scalar case with the original test I added for this which hits the new limit of the full scalar bit width behavior instead of / 2

Thanks for the quick review.
Sure, I will add extra scalar test to reduce-trunc-shl.ll.
I can add tests that are similar to those that you have added in test AMDGPU/shift-i64-opts.ll. I will also add tests for the case where the shift count is equal to the scala size in bits.

Cheers,
Andrea

Closed by commit rL280482: [DAGcombiner] Fix incorrect sinking of a truncate into the operand of a shift. (authored by adibiagio). · Explain WhySep 2 2016, 4:37 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

6 lines

test/

CodeGen/

X86/

reduce-trunc-shl.ll

139 lines

Diff 70143

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,182 Lines • ▼ Show 20 Lines	if ((!LegalOperations \|\| TLI.isOperationLegal(ISD::SELECT, SrcVT)) &&
SDLoc SL(N0);		SDLoc SL(N0);
SDValue Cond = N0.getOperand(0);		SDValue Cond = N0.getOperand(0);
SDValue TruncOp0 = DAG.getNode(ISD::TRUNCATE, SL, VT, N0.getOperand(1));		SDValue TruncOp0 = DAG.getNode(ISD::TRUNCATE, SL, VT, N0.getOperand(1));
SDValue TruncOp1 = DAG.getNode(ISD::TRUNCATE, SL, VT, N0.getOperand(2));		SDValue TruncOp1 = DAG.getNode(ISD::TRUNCATE, SL, VT, N0.getOperand(2));
return DAG.getNode(ISD::SELECT, SDLoc(N), VT, Cond, TruncOp0, TruncOp1);		return DAG.getNode(ISD::SELECT, SDLoc(N), VT, Cond, TruncOp0, TruncOp1);
}		}
}		}

// trunc (shl x, K) -> shl (trunc x), K => K < vt.size / 2		// trunc (shl x, K) -> shl (trunc x), K => K < VT.getScalarSizeInBits()
if (N0.getOpcode() == ISD::SHL && N0.hasOneUse() &&		if (N0.getOpcode() == ISD::SHL && N0.hasOneUse() &&
(!LegalOperations \|\| TLI.isOperationLegalOrCustom(ISD::SHL, VT)) &&		(!LegalOperations \|\| TLI.isOperationLegalOrCustom(ISD::SHL, VT)) &&
TLI.isTypeDesirableForOp(ISD::SHL, VT)) {		TLI.isTypeDesirableForOp(ISD::SHL, VT)) {
if (const ConstantSDNode *CAmt = isConstOrConstSplat(N0.getOperand(1))) {		if (const ConstantSDNode *CAmt = isConstOrConstSplat(N0.getOperand(1))) {
uint64_t Amt = CAmt->getZExtValue();		uint64_t Amt = CAmt->getZExtValue();
unsigned Size = VT.getSizeInBits();		unsigned Size = VT.getScalarSizeInBits();

if (Amt < Size / 2) {		if (Amt < Size) {
SDLoc SL(N);		SDLoc SL(N);
EVT AmtVT = TLI.getShiftAmountTy(VT, DAG.getDataLayout());		EVT AmtVT = TLI.getShiftAmountTy(VT, DAG.getDataLayout());

SDValue Trunc = DAG.getNode(ISD::TRUNCATE, SL, VT, N0.getOperand(0));		SDValue Trunc = DAG.getNode(ISD::TRUNCATE, SL, VT, N0.getOperand(0));
return DAG.getNode(ISD::SHL, SL, VT, Trunc,		return DAG.getNode(ISD::SHL, SL, VT, Trunc,
DAG.getConstant(Amt, SL, AmtVT));		DAG.getConstant(Amt, SL, AmtVT));
}		}
}		}
▲ Show 20 Lines • Show All 7,856 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/reduce-trunc-shl.ll

	Show All 20 Lines
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%val = load <4 x i64>, <4 x i64> addrspace(1)* %in			%val = load <4 x i64>, <4 x i64> addrspace(1)* %in
	%shl = shl <4 x i64> %val, <i64 7, i64 7, i64 7, i64 7>			%shl = shl <4 x i64> %val, <i64 7, i64 7, i64 7, i64 7>
	%trunc = trunc <4 x i64> %shl to <4 x i32>			%trunc = trunc <4 x i64> %shl to <4 x i32>
	store <4 x i32> %trunc, <4 x i32> addrspace(1)* %out			store <4 x i32> %trunc, <4 x i32> addrspace(1)* %out
	ret void			ret void
	}			}

				define <8 x i16> @trunc_shl_v8i16_v8i32(<8 x i32> %a) {
				; SSE2-LABEL: trunc_shl_v8i16_v8i32:
				; SSE2: # BB#0:
				; SSE2-NEXT: pslld $17, %xmm0
				; SSE2-NEXT: pslld $17, %xmm1
				; SSE2-NEXT: pslld $16, %xmm1
				; SSE2-NEXT: psrad $16, %xmm1
				; SSE2-NEXT: pslld $16, %xmm0
				; SSE2-NEXT: psrad $16, %xmm0
				; SSE2-NEXT: packssdw %xmm1, %xmm0
				; SSE2-NEXT: retq
				;
				; AVX2-LABEL: trunc_shl_v8i16_v8i32:
				; AVX2: # BB#0:
				; AVX2-NEXT: vpslld $17, %ymm0, %ymm0
				; AVX2-NEXT: vpshufb {{.*#+}} ymm0 = ymm0[0,1,4,5,8,9,12,13],zero,zero,zero,zero,zero,zero,zero,zero,ymm0[16,17,20,21,24,25,28,29],zero,zero,zero,zero,zero,zero,zero,zero
				; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,2,3]
				; AVX2-NEXT: # kill: %XMM0<def> %XMM0<kill> %YMM0<kill>
				; AVX2-NEXT: vzeroupper
				; AVX2-NEXT: retq
				%shl = shl <8 x i32> %a, <i32 17, i32 17, i32 17, i32 17, i32 17, i32 17, i32 17, i32 17>
				%conv = trunc <8 x i32> %shl to <8 x i16>
				ret <8 x i16> %conv
				}

				define void @trunc_shl_31_i32_i64(i32* %out, i64* %in) {
				; SSE2-LABEL: trunc_shl_31_i32_i64:
				; SSE2: # BB#0:
				; SSE2-NEXT: movl (%rsi), %eax
				; SSE2-NEXT: shll $31, %eax
				; SSE2-NEXT: movl %eax, (%rdi)
				; SSE2-NEXT: retq
				;
				; AVX2-LABEL: trunc_shl_31_i32_i64:
				; AVX2: # BB#0:
				; AVX2-NEXT: movl (%rsi), %eax
				; AVX2-NEXT: shll $31, %eax
				; AVX2-NEXT: movl %eax, (%rdi)
				; AVX2-NEXT: retq
				%val = load i64, i64* %in
				%shl = shl i64 %val, 31
				%trunc = trunc i64 %shl to i32
				store i32 %trunc, i32* %out
				ret void
				}

				define void @trunc_shl_32_i32_i64(i32* %out, i64* %in) {
				; SSE2-LABEL: trunc_shl_32_i32_i64:
				; SSE2: # BB#0:
				; SSE2-NEXT: movl $0, (%rdi)
				; SSE2-NEXT: retq
				;
				; AVX2-LABEL: trunc_shl_32_i32_i64:
				; AVX2: # BB#0:
				; AVX2-NEXT: movl $0, (%rdi)
				; AVX2-NEXT: retq
				%val = load i64, i64* %in
				%shl = shl i64 %val, 32
				%trunc = trunc i64 %shl to i32
				store i32 %trunc, i32* %out
				ret void
				}

				define void @trunc_shl_15_i16_i64(i16* %out, i64* %in) {
				; SSE2-LABEL: trunc_shl_15_i16_i64:
				; SSE2: # BB#0:
				; SSE2-NEXT: movzwl (%rsi), %eax
				; SSE2-NEXT: shlw $15, %ax
				; SSE2-NEXT: movw %ax, (%rdi)
				; SSE2-NEXT: retq
				;
				; AVX2-LABEL: trunc_shl_15_i16_i64:
				; AVX2: # BB#0:
				; AVX2-NEXT: movzwl (%rsi), %eax
				; AVX2-NEXT: shlw $15, %ax
				; AVX2-NEXT: movw %ax, (%rdi)
				; AVX2-NEXT: retq
				%val = load i64, i64* %in
				%shl = shl i64 %val, 15
				%trunc = trunc i64 %shl to i16
				store i16 %trunc, i16* %out
				ret void
				}

				define void @trunc_shl_16_i16_i64(i16* %out, i64* %in) {
				; SSE2-LABEL: trunc_shl_16_i16_i64:
				; SSE2: # BB#0:
				; SSE2-NEXT: movw $0, (%rdi)
				; SSE2-NEXT: retq
				;
				; AVX2-LABEL: trunc_shl_16_i16_i64:
				; AVX2: # BB#0:
				; AVX2-NEXT: movw $0, (%rdi)
				; AVX2-NEXT: retq
				%val = load i64, i64* %in
				%shl = shl i64 %val, 16
				%trunc = trunc i64 %shl to i16
				store i16 %trunc, i16* %out
				ret void
				}

				define void @trunc_shl_7_i8_i64(i8* %out, i64* %in) {
				; SSE2-LABEL: trunc_shl_7_i8_i64:
				; SSE2: # BB#0:
				; SSE2-NEXT: movb (%rsi), %al
				; SSE2-NEXT: shlb $7, %al
				; SSE2-NEXT: movb %al, (%rdi)
				; SSE2-NEXT: retq
				;
				; AVX2-LABEL: trunc_shl_7_i8_i64:
				; AVX2: # BB#0:
				; AVX2-NEXT: movb (%rsi), %al
				; AVX2-NEXT: shlb $7, %al
				; AVX2-NEXT: movb %al, (%rdi)
				; AVX2-NEXT: retq
				%val = load i64, i64* %in
				%shl = shl i64 %val, 7
				%trunc = trunc i64 %shl to i8
				store i8 %trunc, i8* %out
				ret void
				}

				define void @trunc_shl_8_i8_i64(i8* %out, i64* %in) {
				; SSE2-LABEL: trunc_shl_8_i8_i64:
				; SSE2: # BB#0:
				; SSE2-NEXT: movb $0, (%rdi)
				; SSE2-NEXT: retq
				;
				; AVX2-LABEL: trunc_shl_8_i8_i64:
				; AVX2: # BB#0:
				; AVX2-NEXT: movb $0, (%rdi)
				; AVX2-NEXT: retq
				%val = load i64, i64* %in
				%shl = shl i64 %val, 8
				%trunc = trunc i64 %shl to i8
				store i8 %trunc, i8* %out
				ret void
				}