This is an archive of the discontinued LLVM Phabricator instance.

[CostModel] Return TCC_Expensive for non-speculatable ctlz/cttz.
Needs ReviewPublic

Authored by fhahn on Oct 16 2020, 11:35 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
craig.topper
samparker
lebedev.ri

Summary

Before 871556a494552c0f503eec17055f075bcd859937, we would return
TCC_Expensive for non-speculatable CTTZ/CTLZ, but the patch removed the
exit. I am not sure if that was intentional, but it seems now we also
treat un-speculatable CTLZ/CTTZ as non-expensive.

See the change in the test case. The current speculation limit in
SimplifyCFG is set so that a single expensive instruction can be
speculated, but in test9_loop and expensive and a cheap instruction
needs speculating, pushing it over the limit.

Note that currently the X86 backend considers CTTZ/CTLZ as cheap to speculate
on architectures like haswell or skylake, where it is actually quite expensive.
But that is a separate issue.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Oct 16 2020, 11:35 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 16 2020, 11:35 AM

Herald added a subscriber: pengfei. · View Herald Transcript

fhahn requested review of this revision.Oct 16 2020, 11:35 AM

Harbormaster completed remote builds in B75346: Diff 298694.Oct 16 2020, 11:36 AM

Doesn't this affect tests in test/Analysis/CostModel/X86 ?
We might need to enclose the speculatable -> basic vs. expensive
difference inside the vector checks.

In D89578#2335559, @spatel wrote:

Doesn't this affect tests in test/Analysis/CostModel/X86 ?
We might need to enclose the speculatable -> basic vs. expensive
difference inside the vector checks.

I originally only updated the simplifycfg checks. Now all impacted tests should be updated.

Harbormaster completed remote builds in B75354: Diff 298717.Oct 16 2020, 12:22 PM

This might be restoring the old behavior, but I want to make sure we are ok with the potential regressions. If not, we should make some adjustments to the x86 model first.

The SLP v4i32 diff with zero-is-undef set would be something like this, and it's hard to justify IMO:

	bsfl	src32(%rip), %eax
	bsfl	src32+4(%rip), %ecx
	bsfl	src32+8(%rip), %edx
	bsfl	src32+12(%rip), %esi
	movl	%eax, dst32(%rip)
	movl	%ecx, dst32+4(%rip)
	movl	%edx, dst32+8(%rip)
	movl	%esi, dst32+12(%rip)
	retq

vs.

	movdqa	src32(%rip), %xmm0
	pcmpeqd	%xmm1, %xmm1
	paddd	%xmm0, %xmm1
	pandn	%xmm1, %xmm0
	movdqa	%xmm0, %xmm1
	psrlw	$1, %xmm1
	pand	.LCPI0_0(%rip), %xmm1
	psubb	%xmm1, %xmm0
	movdqa	.LCPI0_1(%rip), %xmm1           # xmm1 = [51,51,51,51,51,51,51,51,51,51,51,51,51,51,51,51]
	movdqa	%xmm0, %xmm2
	pand	%xmm1, %xmm2
	psrlw	$2, %xmm0
	pand	%xmm1, %xmm0
	paddb	%xmm2, %xmm0
	movdqa	%xmm0, %xmm1
	psrlw	$4, %xmm1
	paddb	%xmm0, %xmm1
	pand	.LCPI0_2(%rip), %xmm1
	pxor	%xmm0, %xmm0
	movdqa	%xmm1, %xmm2
	punpckhdq	%xmm0, %xmm2            # xmm2 = xmm2[2],xmm0[2],xmm2[3],xmm0[3]
	psadbw	%xmm0, %xmm2
	punpckldq	%xmm0, %xmm1            # xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
	psadbw	%xmm0, %xmm1
	packuswb	%xmm2, %xmm1
	movdqa	%xmm1, dst32(%rip)
	retq

llvm/test/CodeGen/X86/dagcombine-select.ll
438–441	I think we would view this as a regression based on https://bugs.llvm.org/PR46203 / 2328cab16ccd8f17fee782c29fb844662c089fbb Do we need to adjust the isCheapToSpeculateXXXX APIs to acknowledge the zero-is-undef parameter of the intrinsic?

RKSimon added inline comments.Oct 22 2020, 1:53 AM

llvm/test/CodeGen/X86/dagcombine-select.ll
438–441	Yes, it feels like we're going to need some combo of checks for zero-is-undef and if the value is known-never-zero if we can.

This review seems to be stuck/dead, consider abandoning if no longer relevant.

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 5:19 PM

Herald added subscribers: • pcwang-thead, StephenFan. · View Herald Transcript

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

BasicTTIImpl.h

14 lines

test/

Analysis/

CostModel/

X86/

cttz.ll

16 lines

intrinsic-cost-kinds.ll

6 lines

CodeGen/

X86/

dagcombine-select.ll

16 lines

Transforms/

SLPVectorizer/

X86/

cttz.ll

136 lines

SimplifyCFG/

X86/

speculate-cttz-ctlz.ll

74 lines

Diff 298717

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 1,145 Lines • ▼ Show 20 Lines	unsigned getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
default:		default:
// FIXME: all cost kinds should default to the same thing?		// FIXME: all cost kinds should default to the same thing?
if (CostKind != TTI::TCK_RecipThroughput)		if (CostKind != TTI::TCK_RecipThroughput)
return BaseT::getIntrinsicInstrCost(ICA, CostKind);		return BaseT::getIntrinsicInstrCost(ICA, CostKind);
break;		break;

case Intrinsic::cttz:		case Intrinsic::cttz:
// FIXME: If necessary, this should go in target-specific overrides.		// FIXME: If necessary, this should go in target-specific overrides.
if (VF == 1 && RetVF == 1 && getTLI()->isCheapToSpeculateCttz())		if (VF == 1 && RetVF == 1) {
		if (getTLI()->isCheapToSpeculateCttz())
return TargetTransformInfo::TCC_Basic;		return TargetTransformInfo::TCC_Basic;
		return TargetTransformInfo::TCC_Expensive;
		}
break;		break;

case Intrinsic::ctlz:		case Intrinsic::ctlz:
// FIXME: If necessary, this should go in target-specific overrides.		// FIXME: If necessary, this should go in target-specific overrides.
if (VF == 1 && RetVF == 1 && getTLI()->isCheapToSpeculateCtlz())		if (VF == 1 && RetVF == 1) {
		if (getTLI()->isCheapToSpeculateCtlz())
return TargetTransformInfo::TCC_Basic;		return TargetTransformInfo::TCC_Basic;
		return TargetTransformInfo::TCC_Expensive;
		}
break;		break;

case Intrinsic::memcpy:		case Intrinsic::memcpy:
// FIXME: all cost kinds should default to the same thing?		// FIXME: all cost kinds should default to the same thing?
if (CostKind != TTI::TCK_RecipThroughput)		if (CostKind != TTI::TCK_RecipThroughput)
return thisT()->getMemcpyCost(ICA.getInst());		return thisT()->getMemcpyCost(ICA.getInst());
return BaseT::getIntrinsicInstrCost(ICA, CostKind);		return BaseT::getIntrinsicInstrCost(ICA, CostKind);

▲ Show 20 Lines • Show All 748 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/cttz.ll

	Show All 10 Lines

	declare i64 @llvm.cttz.i64(i64, i1)			declare i64 @llvm.cttz.i64(i64, i1)
	declare i32 @llvm.cttz.i32(i32, i1)			declare i32 @llvm.cttz.i32(i32, i1)
	declare i16 @llvm.cttz.i16(i16, i1)			declare i16 @llvm.cttz.i16(i16, i1)
	declare i8 @llvm.cttz.i8(i8, i1)			declare i8 @llvm.cttz.i8(i8, i1)

	define i64 @var_cttz_i64(i64 %a) {			define i64 @var_cttz_i64(i64 %a) {
	; NOBMI-LABEL: 'var_cttz_i64'			; NOBMI-LABEL: 'var_cttz_i64'
	; NOBMI-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %cttz = call i64 @llvm.cttz.i64(i64 %a, i1 false)			; NOBMI-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %cttz = call i64 @llvm.cttz.i64(i64 %a, i1 false)
	; NOBMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %cttz			; NOBMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %cttz
	;			;
	; BMI-LABEL: 'var_cttz_i64'			; BMI-LABEL: 'var_cttz_i64'
	; BMI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i64 @llvm.cttz.i64(i64 %a, i1 false)			; BMI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i64 @llvm.cttz.i64(i64 %a, i1 false)
	; BMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %cttz			; BMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %cttz
	;			;
	%cttz = call i64 @llvm.cttz.i64(i64 %a, i1 0)			%cttz = call i64 @llvm.cttz.i64(i64 %a, i1 0)
	ret i64 %cttz			ret i64 %cttz
	}			}

	define i64 @var_cttz_i64u(i64 %a) {			define i64 @var_cttz_i64u(i64 %a) {
	; NOBMI-LABEL: 'var_cttz_i64u'			; NOBMI-LABEL: 'var_cttz_i64u'
	; NOBMI-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %cttz = call i64 @llvm.cttz.i64(i64 %a, i1 true)			; NOBMI-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %cttz = call i64 @llvm.cttz.i64(i64 %a, i1 true)
	; NOBMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %cttz			; NOBMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %cttz
	;			;
	; BMI-LABEL: 'var_cttz_i64u'			; BMI-LABEL: 'var_cttz_i64u'
	; BMI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i64 @llvm.cttz.i64(i64 %a, i1 true)			; BMI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i64 @llvm.cttz.i64(i64 %a, i1 true)
	; BMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %cttz			; BMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %cttz
	;			;
	%cttz = call i64 @llvm.cttz.i64(i64 %a, i1 1)			%cttz = call i64 @llvm.cttz.i64(i64 %a, i1 1)
	ret i64 %cttz			ret i64 %cttz
	}			}

	define i32 @var_cttz_i32(i32 %a) {			define i32 @var_cttz_i32(i32 %a) {
	; NOBMI-LABEL: 'var_cttz_i32'			; NOBMI-LABEL: 'var_cttz_i32'
	; NOBMI-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %cttz = call i32 @llvm.cttz.i32(i32 %a, i1 false)			; NOBMI-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %cttz = call i32 @llvm.cttz.i32(i32 %a, i1 false)
	; NOBMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %cttz			; NOBMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %cttz
	;			;
	; BMI-LABEL: 'var_cttz_i32'			; BMI-LABEL: 'var_cttz_i32'
	; BMI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i32 @llvm.cttz.i32(i32 %a, i1 false)			; BMI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i32 @llvm.cttz.i32(i32 %a, i1 false)
	; BMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %cttz			; BMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %cttz
	;			;
	%cttz = call i32 @llvm.cttz.i32(i32 %a, i1 0)			%cttz = call i32 @llvm.cttz.i32(i32 %a, i1 0)
	ret i32 %cttz			ret i32 %cttz
	}			}

	define i32 @var_cttz_i32u(i32 %a) {			define i32 @var_cttz_i32u(i32 %a) {
	; NOBMI-LABEL: 'var_cttz_i32u'			; NOBMI-LABEL: 'var_cttz_i32u'
	; NOBMI-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %cttz = call i32 @llvm.cttz.i32(i32 %a, i1 true)			; NOBMI-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %cttz = call i32 @llvm.cttz.i32(i32 %a, i1 true)
	; NOBMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %cttz			; NOBMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %cttz
	;			;
	; BMI-LABEL: 'var_cttz_i32u'			; BMI-LABEL: 'var_cttz_i32u'
	; BMI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i32 @llvm.cttz.i32(i32 %a, i1 true)			; BMI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i32 @llvm.cttz.i32(i32 %a, i1 true)
	; BMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %cttz			; BMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %cttz
	;			;
	%cttz = call i32 @llvm.cttz.i32(i32 %a, i1 1)			%cttz = call i32 @llvm.cttz.i32(i32 %a, i1 1)
	ret i32 %cttz			ret i32 %cttz
	}			}

	define i16 @var_cttz_i16(i16 %a) {			define i16 @var_cttz_i16(i16 %a) {
	; NOBMI-LABEL: 'var_cttz_i16'			; NOBMI-LABEL: 'var_cttz_i16'
	; NOBMI-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %cttz = call i16 @llvm.cttz.i16(i16 %a, i1 false)			; NOBMI-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %cttz = call i16 @llvm.cttz.i16(i16 %a, i1 false)
	; NOBMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i16 %cttz			; NOBMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i16 %cttz
	;			;
	; BMI-LABEL: 'var_cttz_i16'			; BMI-LABEL: 'var_cttz_i16'
	; BMI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i16 @llvm.cttz.i16(i16 %a, i1 false)			; BMI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i16 @llvm.cttz.i16(i16 %a, i1 false)
	; BMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i16 %cttz			; BMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i16 %cttz
	;			;
	%cttz = call i16 @llvm.cttz.i16(i16 %a, i1 0)			%cttz = call i16 @llvm.cttz.i16(i16 %a, i1 0)
	ret i16 %cttz			ret i16 %cttz
	}			}

	define i16 @var_cttz_i16u(i16 %a) {			define i16 @var_cttz_i16u(i16 %a) {
	; NOBMI-LABEL: 'var_cttz_i16u'			; NOBMI-LABEL: 'var_cttz_i16u'
	; NOBMI-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %cttz = call i16 @llvm.cttz.i16(i16 %a, i1 true)			; NOBMI-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %cttz = call i16 @llvm.cttz.i16(i16 %a, i1 true)
	; NOBMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i16 %cttz			; NOBMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i16 %cttz
	;			;
	; BMI-LABEL: 'var_cttz_i16u'			; BMI-LABEL: 'var_cttz_i16u'
	; BMI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i16 @llvm.cttz.i16(i16 %a, i1 true)			; BMI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i16 @llvm.cttz.i16(i16 %a, i1 true)
	; BMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i16 %cttz			; BMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i16 %cttz
	;			;
	%cttz = call i16 @llvm.cttz.i16(i16 %a, i1 1)			%cttz = call i16 @llvm.cttz.i16(i16 %a, i1 1)
	ret i16 %cttz			ret i16 %cttz
	}			}

	define i8 @var_cttz_i8(i8 %a) {			define i8 @var_cttz_i8(i8 %a) {
	; NOBMI-LABEL: 'var_cttz_i8'			; NOBMI-LABEL: 'var_cttz_i8'
	; NOBMI-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %cttz = call i8 @llvm.cttz.i8(i8 %a, i1 false)			; NOBMI-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %cttz = call i8 @llvm.cttz.i8(i8 %a, i1 false)
	; NOBMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %cttz			; NOBMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %cttz
	;			;
	; BMI-LABEL: 'var_cttz_i8'			; BMI-LABEL: 'var_cttz_i8'
	; BMI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i8 @llvm.cttz.i8(i8 %a, i1 false)			; BMI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i8 @llvm.cttz.i8(i8 %a, i1 false)
	; BMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %cttz			; BMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %cttz
	;			;
	%cttz = call i8 @llvm.cttz.i8(i8 %a, i1 0)			%cttz = call i8 @llvm.cttz.i8(i8 %a, i1 0)
	ret i8 %cttz			ret i8 %cttz
	}			}

	define i8 @var_cttz_i8u(i8 %a) {			define i8 @var_cttz_i8u(i8 %a) {
	; NOBMI-LABEL: 'var_cttz_i8u'			; NOBMI-LABEL: 'var_cttz_i8u'
	; NOBMI-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %cttz = call i8 @llvm.cttz.i8(i8 %a, i1 true)			; NOBMI-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %cttz = call i8 @llvm.cttz.i8(i8 %a, i1 true)
	; NOBMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %cttz			; NOBMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %cttz
	;			;
	; BMI-LABEL: 'var_cttz_i8u'			; BMI-LABEL: 'var_cttz_i8u'
	; BMI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i8 @llvm.cttz.i8(i8 %a, i1 true)			; BMI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cttz = call i8 @llvm.cttz.i8(i8 %a, i1 true)
	; BMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %cttz			; BMI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %cttz
	;			;
	%cttz = call i8 @llvm.cttz.i8(i8 %a, i1 1)			%cttz = call i8 @llvm.cttz.i8(i8 %a, i1 1)
	ret i8 %cttz			ret i8 %cttz
	▲ Show 20 Lines • Show All 618 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/intrinsic-cost-kinds.ll

	Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
	;			;
	%s = call float @llvm.fmuladd.f32(float %a, float %b, float %c)			%s = call float @llvm.fmuladd.f32(float %a, float %b, float %c)
	%v = call <16 x float> @llvm.fmuladd.v16f32(<16 x float> %va, <16 x float> %vb, <16 x float> %vc)			%v = call <16 x float> @llvm.fmuladd.v16f32(<16 x float> %va, <16 x float> %vb, <16 x float> %vc)
	ret void			ret void
	}			}

	define void @cttz(i32 %a, <16 x i32> %va) {			define void @cttz(i32 %a, <16 x i32> %va) {
	; THRU-LABEL: 'cttz'			; THRU-LABEL: 'cttz'
	; THRU-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %s = call i32 @llvm.cttz.i32(i32 %a, i1 false)			; THRU-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %s = call i32 @llvm.cttz.i32(i32 %a, i1 false)
	; THRU-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %v = call <16 x i32> @llvm.cttz.v16i32(<16 x i32> %va, i1 false)			; THRU-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %v = call <16 x i32> @llvm.cttz.v16i32(<16 x i32> %va, i1 false)
	; THRU-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; THRU-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; LATE-LABEL: 'cttz'			; LATE-LABEL: 'cttz'
	; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s = call i32 @llvm.cttz.i32(i32 %a, i1 false)			; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s = call i32 @llvm.cttz.i32(i32 %a, i1 false)
	; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v = call <16 x i32> @llvm.cttz.v16i32(<16 x i32> %va, i1 false)			; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v = call <16 x i32> @llvm.cttz.v16i32(<16 x i32> %va, i1 false)
	; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void			; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
	;			;
	; SIZE-LABEL: 'cttz'			; SIZE-LABEL: 'cttz'
	; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %s = call i32 @llvm.cttz.i32(i32 %a, i1 false)			; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %s = call i32 @llvm.cttz.i32(i32 %a, i1 false)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %v = call <16 x i32> @llvm.cttz.v16i32(<16 x i32> %va, i1 false)			; SIZE-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %v = call <16 x i32> @llvm.cttz.v16i32(<16 x i32> %va, i1 false)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void			; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
	;			;
	; SIZE_LATE-LABEL: 'cttz'			; SIZE_LATE-LABEL: 'cttz'
	; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %s = call i32 @llvm.cttz.i32(i32 %a, i1 false)			; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %s = call i32 @llvm.cttz.i32(i32 %a, i1 false)
	; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %v = call <16 x i32> @llvm.cttz.v16i32(<16 x i32> %va, i1 false)			; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %v = call <16 x i32> @llvm.cttz.v16i32(<16 x i32> %va, i1 false)
	; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void			; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
	;			;
	%s = call i32 @llvm.cttz.i32(i32 %a, i1 false)			%s = call i32 @llvm.cttz.i32(i32 %a, i1 false)
	%v = call <16 x i32> @llvm.cttz.v16i32(<16 x i32> %va, i1 false)			%v = call <16 x i32> @llvm.cttz.v16i32(<16 x i32> %va, i1 false)
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 135 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/dagcombine-select.ll

Show First 20 Lines • Show All 429 Lines • ▼ Show 20 Lines	; BMI-NEXT: retq
%add = select i1 %tobool, i32 %.op, i32 0		%add = select i1 %tobool, i32 %.op, i32 0
ret i32 %add		ret i32 %add
}		}

; This matches the pattern emitted for __builtin_ffs - 1		; This matches the pattern emitted for __builtin_ffs - 1
define i32 @cttz_32_eq_select_ffs_m1(i32 %v) nounwind {		define i32 @cttz_32_eq_select_ffs_m1(i32 %v) nounwind {
; NOBMI-LABEL: cttz_32_eq_select_ffs_m1:		; NOBMI-LABEL: cttz_32_eq_select_ffs_m1:
; NOBMI: # %bb.0:		; NOBMI: # %bb.0:
; NOBMI-NEXT: bsfl %edi, %ecx		; NOBMI-NEXT: testl %edi, %edi
		; NOBMI-NEXT: je .LBB26_2
		; NOBMI-NEXT: # %bb.1: # %select.false.sink
		; NOBMI-NEXT: bsfl %edi, %eax
		spatelUnsubmitted Not Done Reply Inline Actions I think we would view this as a regression based on https://bugs.llvm.org/PR46203 / 2328cab16ccd8f17fee782c29fb844662c089fbb Do we need to adjust the isCheapToSpeculateXXXX APIs to acknowledge the zero-is-undef parameter of the intrinsic? spatel: I think we would view this as a regression based on https://bugs.llvm.org/PR46203 /…
		RKSimonUnsubmitted Not Done Reply Inline Actions Yes, it feels like we're going to need some combo of checks for zero-is-undef and if the value is known-never-zero if we can. RKSimon: Yes, it feels like we're going to need some combo of checks for zero-is-undef and if the value…
		; NOBMI-NEXT: retq
		; NOBMI-NEXT: .LBB26_2: # %select.end
; NOBMI-NEXT: movl $-1, %eax		; NOBMI-NEXT: movl $-1, %eax
; NOBMI-NEXT: cmovnel %ecx, %eax
; NOBMI-NEXT: retq		; NOBMI-NEXT: retq
;		;
; BMI-LABEL: cttz_32_eq_select_ffs_m1:		; BMI-LABEL: cttz_32_eq_select_ffs_m1:
; BMI: # %bb.0:		; BMI: # %bb.0:
; BMI-NEXT: tzcntl %edi, %ecx		; BMI-NEXT: tzcntl %edi, %ecx
; BMI-NEXT: movl $-1, %eax		; BMI-NEXT: movl $-1, %eax
; BMI-NEXT: cmovael %ecx, %eax		; BMI-NEXT: cmovael %ecx, %eax
; BMI-NEXT: retq		; BMI-NEXT: retq

%cnt = tail call i32 @llvm.cttz.i32(i32 %v, i1 true)		%cnt = tail call i32 @llvm.cttz.i32(i32 %v, i1 true)
%tobool = icmp eq i32 %v, 0		%tobool = icmp eq i32 %v, 0
%sel = select i1 %tobool, i32 -1, i32 %cnt		%sel = select i1 %tobool, i32 -1, i32 %cnt
ret i32 %sel		ret i32 %sel
}		}

define i32 @cttz_32_ne_select_ffs_m1(i32 %v) nounwind {		define i32 @cttz_32_ne_select_ffs_m1(i32 %v) nounwind {
; NOBMI-LABEL: cttz_32_ne_select_ffs_m1:		; NOBMI-LABEL: cttz_32_ne_select_ffs_m1:
; NOBMI: # %bb.0:		; NOBMI: # %bb.0:
; NOBMI-NEXT: bsfl %edi, %ecx		; NOBMI-NEXT: testl %edi, %edi
		; NOBMI-NEXT: je .LBB27_2
		; NOBMI-NEXT: # %bb.1: # %select.true.sink
		; NOBMI-NEXT: bsfl %edi, %eax
		; NOBMI-NEXT: retq
		; NOBMI-NEXT: .LBB27_2: # %select.end
; NOBMI-NEXT: movl $-1, %eax		; NOBMI-NEXT: movl $-1, %eax
; NOBMI-NEXT: cmovnel %ecx, %eax
; NOBMI-NEXT: retq		; NOBMI-NEXT: retq
;		;
; BMI-LABEL: cttz_32_ne_select_ffs_m1:		; BMI-LABEL: cttz_32_ne_select_ffs_m1:
; BMI: # %bb.0:		; BMI: # %bb.0:
; BMI-NEXT: tzcntl %edi, %ecx		; BMI-NEXT: tzcntl %edi, %ecx
; BMI-NEXT: movl $-1, %eax		; BMI-NEXT: movl $-1, %eax
; BMI-NEXT: cmovael %ecx, %eax		; BMI-NEXT: cmovael %ecx, %eax
; BMI-NEXT: retq		; BMI-NEXT: retq

%cnt = tail call i32 @llvm.cttz.i32(i32 %v, i1 true)		%cnt = tail call i32 @llvm.cttz.i32(i32 %v, i1 true)
%tobool = icmp ne i32 %v, 0		%tobool = icmp ne i32 %v, 0
%sel = select i1 %tobool, i32 %cnt, i32 -1		%sel = select i1 %tobool, i32 %cnt, i32 -1
ret i32 %sel		ret i32 %sel
}		}

llvm/test/Transforms/SLPVectorizer/X86/cttz.ll

Show First 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	;
store i64 %cttz0, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i64 0, i64 0), align 4		store i64 %cttz0, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i64 0, i64 0), align 4
store i64 %cttz1, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i64 0, i64 1), align 4		store i64 %cttz1, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i64 0, i64 1), align 4
store i64 %cttz2, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i64 0, i64 2), align 4		store i64 %cttz2, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i64 0, i64 2), align 4
store i64 %cttz3, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i64 0, i64 3), align 4		store i64 %cttz3, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i64 0, i64 3), align 4
ret void		ret void
}		}

define void @cttz_4i32() #0 {		define void @cttz_4i32() #0 {
; SSE2-LABEL: @cttz_4i32(		; SSE-LABEL: @cttz_4i32(
; SSE2-NEXT: [[LD0:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 0), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 4
; SSE2-NEXT: [[LD1:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 1), align 4		; SSE-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> [[TMP1]], i1 false)
; SSE2-NEXT: [[LD2:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 2), align 4		; SSE-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 4
; SSE2-NEXT: [[LD3:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 3), align 4		; SSE-NEXT: ret void
; SSE2-NEXT: [[CTTZ0:%.*]] = call i32 @llvm.cttz.i32(i32 [[LD0]], i1 false)
; SSE2-NEXT: [[CTTZ1:%.*]] = call i32 @llvm.cttz.i32(i32 [[LD1]], i1 false)
; SSE2-NEXT: [[CTTZ2:%.*]] = call i32 @llvm.cttz.i32(i32 [[LD2]], i1 false)
; SSE2-NEXT: [[CTTZ3:%.*]] = call i32 @llvm.cttz.i32(i32 [[LD3]], i1 false)
; SSE2-NEXT: store i32 [[CTTZ0]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 0), align 4
; SSE2-NEXT: store i32 [[CTTZ1]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 1), align 4
; SSE2-NEXT: store i32 [[CTTZ2]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 2), align 4
; SSE2-NEXT: store i32 [[CTTZ3]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 3), align 4
; SSE2-NEXT: ret void
;
; SSE42-LABEL: @cttz_4i32(
; SSE42-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 4
; SSE42-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> [[TMP1]], i1 false)
; SSE42-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 4
; SSE42-NEXT: ret void
;		;
; AVX1-LABEL: @cttz_4i32(		; AVX1-LABEL: @cttz_4i32(
; AVX1-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 4		; AVX1-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 4
; AVX1-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> [[TMP1]], i1 false)		; AVX1-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> [[TMP1]], i1 false)
; AVX1-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 4		; AVX1-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 4
; AVX1-NEXT: ret void		; AVX1-NEXT: ret void
;		;
; AVX2-LABEL: @cttz_4i32(		; AVX2-LABEL: @cttz_4i32(
Show All 22 Lines	;
store i32 %cttz0, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 0), align 4		store i32 %cttz0, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 0), align 4
store i32 %cttz1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 1), align 4		store i32 %cttz1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 1), align 4
store i32 %cttz2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 2), align 4		store i32 %cttz2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 2), align 4
store i32 %cttz3, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 3), align 4		store i32 %cttz3, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @cttz_8i32() #0 {		define void @cttz_8i32() #0 {
; SSE2-LABEL: @cttz_8i32(		; SSE-LABEL: @cttz_8i32(
; SSE2-NEXT: [[LD0:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 0), align 2		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 2
; SSE2-NEXT: [[LD1:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 1), align 2		; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 2
; SSE2-NEXT: [[LD2:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 2), align 2		; SSE-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> [[TMP1]], i1 false)
; SSE2-NEXT: [[LD3:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 3), align 2		; SSE-NEXT: [[TMP4:%.*]] = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> [[TMP2]], i1 false)
; SSE2-NEXT: [[LD4:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 4), align 2		; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 2
; SSE2-NEXT: [[LD5:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 5), align 2		; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 2
; SSE2-NEXT: [[LD6:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 6), align 2		; SSE-NEXT: ret void
; SSE2-NEXT: [[LD7:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 7), align 2
; SSE2-NEXT: [[CTTZ0:%.*]] = call i32 @llvm.cttz.i32(i32 [[LD0]], i1 false)
; SSE2-NEXT: [[CTTZ1:%.*]] = call i32 @llvm.cttz.i32(i32 [[LD1]], i1 false)
; SSE2-NEXT: [[CTTZ2:%.*]] = call i32 @llvm.cttz.i32(i32 [[LD2]], i1 false)
; SSE2-NEXT: [[CTTZ3:%.*]] = call i32 @llvm.cttz.i32(i32 [[LD3]], i1 false)
; SSE2-NEXT: [[CTTZ4:%.*]] = call i32 @llvm.cttz.i32(i32 [[LD4]], i1 false)
; SSE2-NEXT: [[CTTZ5:%.*]] = call i32 @llvm.cttz.i32(i32 [[LD5]], i1 false)
; SSE2-NEXT: [[CTTZ6:%.*]] = call i32 @llvm.cttz.i32(i32 [[LD6]], i1 false)
; SSE2-NEXT: [[CTTZ7:%.*]] = call i32 @llvm.cttz.i32(i32 [[LD7]], i1 false)
; SSE2-NEXT: store i32 [[CTTZ0]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 0), align 2
; SSE2-NEXT: store i32 [[CTTZ1]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 1), align 2
; SSE2-NEXT: store i32 [[CTTZ2]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 2), align 2
; SSE2-NEXT: store i32 [[CTTZ3]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 3), align 2
; SSE2-NEXT: store i32 [[CTTZ4]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4), align 2
; SSE2-NEXT: store i32 [[CTTZ5]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 5), align 2
; SSE2-NEXT: store i32 [[CTTZ6]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 6), align 2
; SSE2-NEXT: store i32 [[CTTZ7]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 7), align 2
; SSE2-NEXT: ret void
;
; SSE42-LABEL: @cttz_8i32(
; SSE42-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 2
; SSE42-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 2
; SSE42-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> [[TMP1]], i1 false)
; SSE42-NEXT: [[TMP4:%.*]] = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> [[TMP2]], i1 false)
; SSE42-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 2
; SSE42-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 2
; SSE42-NEXT: ret void
;		;
; AVX-LABEL: @cttz_8i32(		; AVX-LABEL: @cttz_8i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([8 x i32]* @src32 to <8 x i32>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([8 x i32]* @src32 to <8 x i32>*), align 2
; AVX-NEXT: [[TMP2:%.*]] = call <8 x i32> @llvm.cttz.v8i32(<8 x i32> [[TMP1]], i1 false)		; AVX-NEXT: [[TMP2:%.*]] = call <8 x i32> @llvm.cttz.v8i32(<8 x i32> [[TMP1]], i1 false)
; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([8 x i32]* @dst32 to <8 x i32>*), align 2		; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([8 x i32]* @dst32 to <8 x i32>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
%ld0 = load i32, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 0), align 2		%ld0 = load i32, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 0), align 2
▲ Show 20 Lines • Show All 347 Lines • ▼ Show 20 Lines	;
store i64 %cttz0, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i64 0, i64 0), align 4		store i64 %cttz0, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i64 0, i64 0), align 4
store i64 %cttz1, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i64 0, i64 1), align 4		store i64 %cttz1, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i64 0, i64 1), align 4
store i64 %cttz2, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i64 0, i64 2), align 4		store i64 %cttz2, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i64 0, i64 2), align 4
store i64 %cttz3, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i64 0, i64 3), align 4		store i64 %cttz3, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @dst64, i64 0, i64 3), align 4
ret void		ret void
}		}

define void @cttz_undef_4i32() #0 {		define void @cttz_undef_4i32() #0 {
; SSE2-LABEL: @cttz_undef_4i32(		; SSE-LABEL: @cttz_undef_4i32(
; SSE2-NEXT: [[LD0:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 0), align 4		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 4
; SSE2-NEXT: [[LD1:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 1), align 4		; SSE-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> [[TMP1]], i1 true)
; SSE2-NEXT: [[LD2:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 2), align 4		; SSE-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 4
; SSE2-NEXT: [[LD3:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 3), align 4		; SSE-NEXT: ret void
; SSE2-NEXT: [[CTTZ0:%.*]] = call i32 @llvm.cttz.i32(i32 [[LD0]], i1 true)
; SSE2-NEXT: [[CTTZ1:%.*]] = call i32 @llvm.cttz.i32(i32 [[LD1]], i1 true)
; SSE2-NEXT: [[CTTZ2:%.*]] = call i32 @llvm.cttz.i32(i32 [[LD2]], i1 true)
; SSE2-NEXT: [[CTTZ3:%.*]] = call i32 @llvm.cttz.i32(i32 [[LD3]], i1 true)
; SSE2-NEXT: store i32 [[CTTZ0]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 0), align 4
; SSE2-NEXT: store i32 [[CTTZ1]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 1), align 4
; SSE2-NEXT: store i32 [[CTTZ2]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 2), align 4
; SSE2-NEXT: store i32 [[CTTZ3]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 3), align 4
; SSE2-NEXT: ret void
;
; SSE42-LABEL: @cttz_undef_4i32(
; SSE42-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 4
; SSE42-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> [[TMP1]], i1 true)
; SSE42-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 4
; SSE42-NEXT: ret void
;		;
; AVX1-LABEL: @cttz_undef_4i32(		; AVX1-LABEL: @cttz_undef_4i32(
; AVX1-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 4		; AVX1-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 4
; AVX1-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> [[TMP1]], i1 true)		; AVX1-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> [[TMP1]], i1 true)
; AVX1-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 4		; AVX1-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 4
; AVX1-NEXT: ret void		; AVX1-NEXT: ret void
;		;
; AVX2-LABEL: @cttz_undef_4i32(		; AVX2-LABEL: @cttz_undef_4i32(
Show All 22 Lines	;
store i32 %cttz0, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 0), align 4		store i32 %cttz0, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 0), align 4
store i32 %cttz1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 1), align 4		store i32 %cttz1, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 1), align 4
store i32 %cttz2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 2), align 4		store i32 %cttz2, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 2), align 4
store i32 %cttz3, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 3), align 4		store i32 %cttz3, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 3), align 4
ret void		ret void
}		}

define void @cttz_undef_8i32() #0 {		define void @cttz_undef_8i32() #0 {
; SSE2-LABEL: @cttz_undef_8i32(		; SSE-LABEL: @cttz_undef_8i32(
; SSE2-NEXT: [[LD0:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 0), align 2		; SSE-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 2
; SSE2-NEXT: [[LD1:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 1), align 2		; SSE-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 2
; SSE2-NEXT: [[LD2:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 2), align 2		; SSE-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> [[TMP1]], i1 true)
; SSE2-NEXT: [[LD3:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 3), align 2		; SSE-NEXT: [[TMP4:%.*]] = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> [[TMP2]], i1 true)
; SSE2-NEXT: [[LD4:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 4), align 2		; SSE-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 2
; SSE2-NEXT: [[LD5:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 5), align 2		; SSE-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 2
; SSE2-NEXT: [[LD6:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 6), align 2		; SSE-NEXT: ret void
; SSE2-NEXT: [[LD7:%.]] = load i32, i32 getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 7), align 2
; SSE2-NEXT: [[CTTZ0:%.*]] = call i32 @llvm.cttz.i32(i32 [[LD0]], i1 true)
; SSE2-NEXT: [[CTTZ1:%.*]] = call i32 @llvm.cttz.i32(i32 [[LD1]], i1 true)
; SSE2-NEXT: [[CTTZ2:%.*]] = call i32 @llvm.cttz.i32(i32 [[LD2]], i1 true)
; SSE2-NEXT: [[CTTZ3:%.*]] = call i32 @llvm.cttz.i32(i32 [[LD3]], i1 true)
; SSE2-NEXT: [[CTTZ4:%.*]] = call i32 @llvm.cttz.i32(i32 [[LD4]], i1 true)
; SSE2-NEXT: [[CTTZ5:%.*]] = call i32 @llvm.cttz.i32(i32 [[LD5]], i1 true)
; SSE2-NEXT: [[CTTZ6:%.*]] = call i32 @llvm.cttz.i32(i32 [[LD6]], i1 true)
; SSE2-NEXT: [[CTTZ7:%.*]] = call i32 @llvm.cttz.i32(i32 [[LD7]], i1 true)
; SSE2-NEXT: store i32 [[CTTZ0]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 0), align 2
; SSE2-NEXT: store i32 [[CTTZ1]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 1), align 2
; SSE2-NEXT: store i32 [[CTTZ2]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 2), align 2
; SSE2-NEXT: store i32 [[CTTZ3]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 3), align 2
; SSE2-NEXT: store i32 [[CTTZ4]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4), align 2
; SSE2-NEXT: store i32 [[CTTZ5]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 5), align 2
; SSE2-NEXT: store i32 [[CTTZ6]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 6), align 2
; SSE2-NEXT: store i32 [[CTTZ7]], i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 7), align 2
; SSE2-NEXT: ret void
;
; SSE42-LABEL: @cttz_undef_8i32(
; SSE42-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> bitcast ([8 x i32]* @src32 to <4 x i32>*), align 2
; SSE42-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 4) to <4 x i32>*), align 2
; SSE42-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> [[TMP1]], i1 true)
; SSE42-NEXT: [[TMP4:%.*]] = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> [[TMP2]], i1 true)
; SSE42-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* bitcast ([8 x i32]* @dst32 to <4 x i32>*), align 2
; SSE42-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* bitcast (i32* getelementptr inbounds ([8 x i32], [8 x i32]* @dst32, i32 0, i64 4) to <4 x i32>*), align 2
; SSE42-NEXT: ret void
;		;
; AVX-LABEL: @cttz_undef_8i32(		; AVX-LABEL: @cttz_undef_8i32(
; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([8 x i32]* @src32 to <8 x i32>*), align 2		; AVX-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> bitcast ([8 x i32]* @src32 to <8 x i32>*), align 2
; AVX-NEXT: [[TMP2:%.*]] = call <8 x i32> @llvm.cttz.v8i32(<8 x i32> [[TMP1]], i1 true)		; AVX-NEXT: [[TMP2:%.*]] = call <8 x i32> @llvm.cttz.v8i32(<8 x i32> [[TMP1]], i1 true)
; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([8 x i32]* @dst32 to <8 x i32>*), align 2		; AVX-NEXT: store <8 x i32> [[TMP2]], <8 x i32>* bitcast ([8 x i32]* @dst32 to <8 x i32>*), align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
%ld0 = load i32, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 0), align 2		%ld0 = load i32, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @src32, i32 0, i64 0), align 2
▲ Show 20 Lines • Show All 301 Lines • Show Last 20 Lines

llvm/test/Transforms/SimplifyCFG/X86/speculate-cttz-ctlz.ll

Show First 20 Lines • Show All 401 Lines • ▼ Show 20 Lines	cond.true: ; preds = %entry
br label %cond.end		br label %cond.end

cond.end: ; preds = %entry, %cond.true		cond.end: ; preds = %entry, %cond.true
%cond = phi i16 [ %cast, %cond.true ], [ 32, %entry ]		%cond = phi i16 [ %cast, %cond.true ], [ 32, %entry ]
ret i16 %cond		ret i16 %cond
}		}

define i16 @test9_loop(i32 %x, i16* %ptr) {		define i16 @test9_loop(i32 %x, i16* %ptr) {
; ALL-LABEL: @test9_loop(		; BMI-LABEL: @test9_loop(
; ALL-NEXT: entry:		; BMI-NEXT: entry:
; ALL-NEXT: br label [[LOOP_HEADER:%.*]]		; BMI-NEXT: br label [[LOOP_HEADER:%.*]]
; ALL: loop.header:		; BMI: loop.header:
; ALL-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[LOOP_HEADER]] ]		; BMI-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[LOOP_HEADER]] ]
; ALL-NEXT: [[TOBOOL:%.]] = icmp eq i32 [[X:%.]], 0		; BMI-NEXT: [[TOBOOL:%.]] = icmp eq i32 [[X:%.]], 0
; ALL-NEXT: [[XOR:%.*]] = xor i32 [[X]], -1		; BMI-NEXT: [[XOR:%.*]] = xor i32 [[X]], -1
; ALL-NEXT: [[TMP0:%.*]] = tail call i32 @llvm.cttz.i32(i32 [[XOR]], i1 true)		; BMI-NEXT: [[TMP0:%.*]] = tail call i32 @llvm.cttz.i32(i32 [[XOR]], i1 true)
; ALL-NEXT: [[CAST:%.*]] = trunc i32 [[TMP0]] to i16		; BMI-NEXT: [[CAST:%.*]] = trunc i32 [[TMP0]] to i16
; ALL-NEXT: [[COND:%.*]] = select i1 [[TOBOOL]], i16 32, i16 [[CAST]]		; BMI-NEXT: [[COND:%.*]] = select i1 [[TOBOOL]], i16 32, i16 [[CAST]]
; ALL-NEXT: store i16 [[COND]], i16* [[PTR:%.*]], align 2		; BMI-NEXT: store i16 [[COND]], i16* [[PTR:%.*]], align 2
; ALL-NEXT: [[IV_NEXT]] = add i32 [[IV]], 1		; BMI-NEXT: [[IV_NEXT]] = add i32 [[IV]], 1
; ALL-NEXT: [[EC:%.*]] = icmp eq i32 [[IV]], 100		; BMI-NEXT: [[EC:%.*]] = icmp eq i32 [[IV]], 100
; ALL-NEXT: br i1 [[EC]], label [[LOOP_EXIT:%.*]], label [[LOOP_HEADER]]		; BMI-NEXT: br i1 [[EC]], label [[LOOP_EXIT:%.*]], label [[LOOP_HEADER]]
; ALL: loop.exit:		; BMI: loop.exit:
; ALL-NEXT: ret i16 [[COND]]		; BMI-NEXT: ret i16 [[COND]]
		;
		; LZCNT-LABEL: @test9_loop(
		; LZCNT-NEXT: entry:
		; LZCNT-NEXT: br label [[LOOP_HEADER:%.*]]
		; LZCNT: loop.header:
		; LZCNT-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[COND_END:%.]] ]
		; LZCNT-NEXT: [[TOBOOL:%.]] = icmp eq i32 [[X:%.]], 0
		; LZCNT-NEXT: br i1 [[TOBOOL]], label [[COND_END]], label [[COND_TRUE:%.*]]
		; LZCNT: cond.true:
		; LZCNT-NEXT: [[XOR:%.*]] = xor i32 [[X]], -1
		; LZCNT-NEXT: [[TMP0:%.*]] = tail call i32 @llvm.cttz.i32(i32 [[XOR]], i1 true)
		; LZCNT-NEXT: [[CAST:%.*]] = trunc i32 [[TMP0]] to i16
		; LZCNT-NEXT: br label [[COND_END]]
		; LZCNT: cond.end:
		; LZCNT-NEXT: [[COND:%.*]] = phi i16 [ [[CAST]], [[COND_TRUE]] ], [ 32, [[LOOP_HEADER]] ]
		; LZCNT-NEXT: store i16 [[COND]], i16* [[PTR:%.*]], align 2
		; LZCNT-NEXT: [[IV_NEXT]] = add i32 [[IV]], 1
		; LZCNT-NEXT: [[EC:%.*]] = icmp eq i32 [[IV]], 100
		; LZCNT-NEXT: br i1 [[EC]], label [[LOOP_EXIT:%.*]], label [[LOOP_HEADER]]
		; LZCNT: loop.exit:
		; LZCNT-NEXT: ret i16 [[COND]]
		;
		; GENERIC-LABEL: @test9_loop(
		; GENERIC-NEXT: entry:
		; GENERIC-NEXT: br label [[LOOP_HEADER:%.*]]
		; GENERIC: loop.header:
		; GENERIC-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[COND_END:%.]] ]
		; GENERIC-NEXT: [[TOBOOL:%.]] = icmp eq i32 [[X:%.]], 0
		; GENERIC-NEXT: br i1 [[TOBOOL]], label [[COND_END]], label [[COND_TRUE:%.*]]
		; GENERIC: cond.true:
		; GENERIC-NEXT: [[XOR:%.*]] = xor i32 [[X]], -1
		; GENERIC-NEXT: [[TMP0:%.*]] = tail call i32 @llvm.cttz.i32(i32 [[XOR]], i1 true)
		; GENERIC-NEXT: [[CAST:%.*]] = trunc i32 [[TMP0]] to i16
		; GENERIC-NEXT: br label [[COND_END]]
		; GENERIC: cond.end:
		; GENERIC-NEXT: [[COND:%.*]] = phi i16 [ [[CAST]], [[COND_TRUE]] ], [ 32, [[LOOP_HEADER]] ]
		; GENERIC-NEXT: store i16 [[COND]], i16* [[PTR:%.*]], align 2
		; GENERIC-NEXT: [[IV_NEXT]] = add i32 [[IV]], 1
		; GENERIC-NEXT: [[EC:%.*]] = icmp eq i32 [[IV]], 100
		; GENERIC-NEXT: br i1 [[EC]], label [[LOOP_EXIT:%.*]], label [[LOOP_HEADER]]
		; GENERIC: loop.exit:
		; GENERIC-NEXT: ret i16 [[COND]]
;		;
entry:		entry:
br label %loop.header		br label %loop.header

loop.header:		loop.header:
%iv = phi i32 [ 0, %entry ], [ %iv.next, %cond.end ]		%iv = phi i32 [ 0, %entry ], [ %iv.next, %cond.end ]
%tobool = icmp eq i32 %x, 0		%tobool = icmp eq i32 %x, 0
br i1 %tobool, label %cond.end, label %cond.true		br i1 %tobool, label %cond.end, label %cond.true
Show All 24 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[CostModel] Return TCC_Expensive for non-speculatable ctlz/cttz.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 298717

llvm/include/llvm/CodeGen/BasicTTIImpl.h

llvm/test/Analysis/CostModel/X86/cttz.ll

llvm/test/Analysis/CostModel/X86/intrinsic-cost-kinds.ll

llvm/test/CodeGen/X86/dagcombine-select.ll

llvm/test/Transforms/SLPVectorizer/X86/cttz.ll

llvm/test/Transforms/SimplifyCFG/X86/speculate-cttz-ctlz.ll

[CostModel] Return TCC_Expensive for non-speculatable ctlz/cttz.
Needs ReviewPublic