This is an archive of the discontinued LLVM Phabricator instance.

[x86] Correct the implementation of isTruncateFree to be more accurate
Changes PlannedPublic

Authored by craig.topper on Sep 26 2017, 4:52 PM.

Download Raw Diff

Details

Reviewers

RKSimon
zvi
spatel

Summary

Currently we returned true as long as the source type is larger than the dest type, but truncates are only "free" if we can use a subregister extract. This corrects the implementation to match that.

It looks like the EVT signature was also running the check on vectors which was probably unintentional. So I've corrected that here. I think this may have exposed some missing cases in the cost model.

The avx512-mask-op.ll changed because we previously promoted the load to 32-bits under the assumption that truncating from i32 to i1 is free. This ultimately allowed the two ands to be CSEd by the DAG since there were then both i32. Now we have one in i32 and one in i8.

Diff Detail

Event Timeline

craig.topper created this revision.Sep 26 2017, 4:52 PM

LGTM - the avx512-mask-op.ll regression is an annoyance but a separate problem IMO.

This revision is now accepted and ready to land.Sep 28 2017, 2:24 AM

I'm investigating a perf loss on one our benchmarks before committing this.

I can't seem to fix the perf loss on this one benchmark. So I need to hold off on this.

It looks to be that in this particular case, the SLP vectorize was previously creating a v16i64->v16i8 truncate in AVX2 and we are no longer doing that. Attempts at fudging new entries into the cost model to lower the truncate cost (which was calculated at 11) haven't worked so far.

RKSimon added inline comments.Dec 9 2017, 8:55 AM

test/Analysis/CostModel/X86/trunc.ll
50	missing AVX512 cost?
94	missing AVX512 cost?

The CHECK lines have been added to the test after the last time this review was updated.

@craig.topper Did you ever get to the cause of the regression you saw?

Revision Contents

Path

Size

lib/

Target/

X86/

X86ISelLowering.cpp

24 lines

test/

Analysis/

CostModel/

X86/

trunc.ll

6 lines

CodeGen/

X86/

avx512-mask-op.ll

9 lines

Diff 116738

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 24,771 Lines • ▼ Show 20 Lines	bool X86TargetLowering::isVectorShiftByScalarCheap(Type *Ty) const {
if (Subtarget.hasInt256() && (Bits == 32 \|\| Bits == 64))		if (Subtarget.hasInt256() && (Bits == 32 \|\| Bits == 64))
return false;		return false;

// Otherwise, it's significantly cheaper to shift by a scalar amount than by a		// Otherwise, it's significantly cheaper to shift by a scalar amount than by a
// fully general vector.		// fully general vector.
return true;		return true;
}		}

bool X86TargetLowering::isTruncateFree(Type Ty1, Type Ty2) const {		bool X86TargetLowering::isTruncateFree(Type SrcTy, Type DstTy) const {
if (!Ty1->isIntegerTy() \|\| !Ty2->isIntegerTy())		if (!SrcTy->isIntegerTy() \|\| !DstTy->isIntegerTy())
return false;		return false;
unsigned NumBits1 = Ty1->getPrimitiveSizeInBits();		unsigned SrcBits = SrcTy->getPrimitiveSizeInBits();
unsigned NumBits2 = Ty2->getPrimitiveSizeInBits();		unsigned DstBits = DstTy->getPrimitiveSizeInBits();
return NumBits1 > NumBits2;		return SrcBits > DstBits &&
		(SrcBits == 64 \|\| SrcBits == 32 \|\| SrcBits == 16) &&
		(DstBits == 32 \|\| DstBits == 16 \|\| DstBits == 8);
}		}

bool X86TargetLowering::allowTruncateForTailCall(Type Ty1, Type Ty2) const {		bool X86TargetLowering::allowTruncateForTailCall(Type Ty1, Type Ty2) const {
if (!Ty1->isIntegerTy() \|\| !Ty2->isIntegerTy())		if (!Ty1->isIntegerTy() \|\| !Ty2->isIntegerTy())
return false;		return false;

if (!isTypeLegal(EVT::getEVT(Ty1)))		if (!isTypeLegal(EVT::getEVT(Ty1)))
return false;		return false;
Show All 9 Lines	bool X86TargetLowering::isLegalICmpImmediate(int64_t Imm) const {
return isInt<32>(Imm);		return isInt<32>(Imm);
}		}

bool X86TargetLowering::isLegalAddImmediate(int64_t Imm) const {		bool X86TargetLowering::isLegalAddImmediate(int64_t Imm) const {
// Can also use sub to handle negated immediates.		// Can also use sub to handle negated immediates.
return isInt<32>(Imm);		return isInt<32>(Imm);
}		}

bool X86TargetLowering::isTruncateFree(EVT VT1, EVT VT2) const {		bool X86TargetLowering::isTruncateFree(EVT SrcVT, EVT DstVT) const {
if (!VT1.isInteger() \|\| !VT2.isInteger())		if (!SrcVT.isScalarInteger() \|\| !DstVT.isScalarInteger())
return false;		return false;
unsigned NumBits1 = VT1.getSizeInBits();		unsigned SrcBits = SrcVT.getSizeInBits();
unsigned NumBits2 = VT2.getSizeInBits();		unsigned DstBits = DstVT.getSizeInBits();
return NumBits1 > NumBits2;		return SrcBits > DstBits &&
		(SrcBits == 64 \|\| SrcBits == 32 \|\| SrcBits == 16) &&
		(DstBits == 32 \|\| DstBits == 16 \|\| DstBits == 8);
}		}

bool X86TargetLowering::isZExtFree(Type Ty1, Type Ty2) const {		bool X86TargetLowering::isZExtFree(Type Ty1, Type Ty2) const {
// x86-64 implicitly zero-extends 32-bit results in 64-bit registers.		// x86-64 implicitly zero-extends 32-bit results in 64-bit registers.
return Ty1->isIntegerTy(32) && Ty2->isIntegerTy(64) && Subtarget.is64Bit();		return Ty1->isIntegerTy(32) && Ty2->isIntegerTy(64) && Subtarget.is64Bit();
}		}

bool X86TargetLowering::isZExtFree(EVT VT1, EVT VT2) const {		bool X86TargetLowering::isZExtFree(EVT VT1, EVT VT2) const {
▲ Show 20 Lines • Show All 12,284 Lines • Show Last 20 Lines

test/Analysis/CostModel/X86/trunc.ll

Show All 39 Lines	define i32 @trunc_vXi16() {

; SSE: cost of 1 {{.*}} %V4i64 = trunc		; SSE: cost of 1 {{.*}} %V4i64 = trunc
; AVX1: cost of 4 {{.*}} %V4i64 = trunc		; AVX1: cost of 4 {{.*}} %V4i64 = trunc
; AVX2: cost of 2 {{.*}} %V4i64 = trunc		; AVX2: cost of 2 {{.*}} %V4i64 = trunc
; AVX512: cost of 2 {{.*}} %V4i64 = trunc		; AVX512: cost of 2 {{.*}} %V4i64 = trunc
%V4i64 = trunc <4 x i64> undef to <4 x i16>		%V4i64 = trunc <4 x i64> undef to <4 x i16>

; SSE: cost of 3 {{.*}} %V8i64 = trunc		; SSE: cost of 3 {{.*}} %V8i64 = trunc
; AVX: cost of 0 {{.*}} %V8i64 = trunc		; AVX1: cost of 9 {{.*}} %V8i64 = trunc
		; AVX2: cost of 5 {{.*}} %V8i64 = trunc
%V8i64 = trunc <8 x i64> undef to <8 x i16>		%V8i64 = trunc <8 x i64> undef to <8 x i16>
		RKSimonUnsubmitted Not Done Reply Inline Actions missing AVX512 cost? RKSimon: missing AVX512 cost?

; SSE2: cost of 3 {{.*}} %V4i32 = trunc		; SSE2: cost of 3 {{.*}} %V4i32 = trunc
; SSSE3: cost of 3 {{.*}} %V4i32 = trunc		; SSSE3: cost of 3 {{.*}} %V4i32 = trunc
; SSE42: cost of 1 {{.*}} %V4i32 = trunc		; SSE42: cost of 1 {{.*}} %V4i32 = trunc
; AVX1: cost of 1 {{.*}} %V4i32 = trunc		; AVX1: cost of 1 {{.*}} %V4i32 = trunc
; AVX2: cost of 1 {{.*}} %V4i32 = trunc		; AVX2: cost of 1 {{.*}} %V4i32 = trunc
; AVX512: cost of 1 {{.*}} %V4i32 = trunc		; AVX512: cost of 1 {{.*}} %V4i32 = trunc
%V4i32 = trunc <4 x i32> undef to <4 x i16>		%V4i32 = trunc <4 x i32> undef to <4 x i16>
Show All 25 Lines	define i32 @trunc_vXi8() {

; SSE: cost of 1 {{.*}} %V4i64 = trunc		; SSE: cost of 1 {{.*}} %V4i64 = trunc
; AVX1: cost of 4 {{.*}} %V4i64 = trunc		; AVX1: cost of 4 {{.*}} %V4i64 = trunc
; AVX2: cost of 2 {{.*}} %V4i64 = trunc		; AVX2: cost of 2 {{.*}} %V4i64 = trunc
; AVX512: cost of 2 {{.*}} %V4i64 = trunc		; AVX512: cost of 2 {{.*}} %V4i64 = trunc
%V4i64 = trunc <4 x i64> undef to <4 x i8>		%V4i64 = trunc <4 x i64> undef to <4 x i8>

; SSE: cost of 3 {{.*}} %V8i64 = trunc		; SSE: cost of 3 {{.*}} %V8i64 = trunc
; AVX: cost of 0 {{.*}} %V8i64 = trunc		; AVX1: cost of 9 {{.*}} %V8i64 = trunc
		; AVX2: cost of 5 {{.*}} %V8i64 = trunc
%V8i64 = trunc <8 x i64> undef to <8 x i8>		%V8i64 = trunc <8 x i64> undef to <8 x i8>
		RKSimonUnsubmitted Not Done Reply Inline Actions missing AVX512 cost? RKSimon: missing AVX512 cost?

; SSE: cost of 0 {{.*}} %V2i32 = trunc		; SSE: cost of 0 {{.*}} %V2i32 = trunc
; AVX: cost of 0 {{.*}} %V2i32 = trunc		; AVX: cost of 0 {{.*}} %V2i32 = trunc
%V2i32 = trunc <2 x i32> undef to <2 x i8>		%V2i32 = trunc <2 x i32> undef to <2 x i8>

; SSE2: cost of 3 {{.*}} %V4i32 = trunc		; SSE2: cost of 3 {{.*}} %V4i32 = trunc
; SSSE3: cost of 3 {{.*}} %V4i32 = trunc		; SSSE3: cost of 3 {{.*}} %V4i32 = trunc
; SSE42: cost of 1 {{.*}} %V4i32 = trunc		; SSE42: cost of 1 {{.*}} %V4i32 = trunc
▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

test/CodeGen/X86/avx512-mask-op.ll

	Show First 20 Lines • Show All 1,623 Lines • ▼ Show 20 Lines
	; f2(v);			; f2(v);
	;}			;}

	@f1.v = internal unnamed_addr global i1 false, align 4			@f1.v = internal unnamed_addr global i1 false, align 4

	define void @f1(i32 %c) {			define void @f1(i32 %c) {
	; CHECK-LABEL: f1:			; CHECK-LABEL: f1:
	; CHECK: ## BB#0: ## %entry			; CHECK: ## BB#0: ## %entry
	; CHECK-NEXT: movzbl {{.*}}(%rip), %edi			; CHECK-NEXT: movb {{.*}}(%rip), %al
	; CHECK-NEXT: xorl $1, %edi			; CHECK-NEXT: xorb $1, %al
	; CHECK-NEXT: movb %dil, {{.*}}(%rip)			; CHECK-NEXT: movzbl %al, %edi
				; CHECK-NEXT: andb $1, %al
				; CHECK-NEXT: movb %al, {{.*}}(%rip)
				; CHECK-NEXT: andl $1, %edi
	; CHECK-NEXT: jmp _f2 ## TAILCALL			; CHECK-NEXT: jmp _f2 ## TAILCALL
	entry:			entry:
	%.b1 = load i1, i1* @f1.v, align 4			%.b1 = load i1, i1* @f1.v, align 4
	%not..b1 = xor i1 %.b1, true			%not..b1 = xor i1 %.b1, true
	store i1 %not..b1, i1* @f1.v, align 4			store i1 %not..b1, i1* @f1.v, align 4
	%0 = zext i1 %not..b1 to i32			%0 = zext i1 %not..b1 to i32
	tail call void @f2(i32 %0) #2			tail call void @f2(i32 %0) #2
	ret void			ret void
	▲ Show 20 Lines • Show All 2,267 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[x86] Correct the implementation of isTruncateFree to be more accurateChanges PlannedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 116738

lib/Target/X86/X86ISelLowering.cpp

test/Analysis/CostModel/X86/trunc.ll

test/CodeGen/X86/avx512-mask-op.ll

[x86] Correct the implementation of isTruncateFree to be more accurate
Changes PlannedPublic