This is an archive of the discontinued LLVM Phabricator instance.

[CostModel] remove cost-kind predicate for ctlz/cttz intrinsics in basic TTI implementation
ClosedPublic

Authored by spatel on Oct 15 2020, 6:35 AM.

Download Raw Diff

Details

Reviewers

samparker
RKSimon
lebedev.ri
dmgreen
fhahn
craig.topper

Commits

rG9f6048f83dc2: [CostModel] remove cost-kind predicate for ctlz/cttz intrinsics in basic TTI…

Summary

The cost modeling for intrinsics is a patchwork based on different expectations from the callers, so it's a mess. I'm hoping to untangle this to allow canonicalization to the new min/max intrinsics in IR.
The general goal is to remove the cost-kind restriction here in the basic implementation class. Ie, if some intrinsic has throughput cost of 104, assume that it has the same size, latency, and blended costs. Effectively, an intrinsic with cost N is composed of N simple instructions. If that's not correct, the target should provide a more accurate override.

The x86-64 SSE2 subtarget cost diffs require explanation:

The scalar ctlz/cttz are assuming "BSR+XOR+CMOV" or "TEST+BSF+CMOV/BRANCH", so not cheap.
The 128-bit SSE vector width versions assume cost of 18 or 26 (no explanation provided in the tables, but this corresponds to a bunch of shift/logic/compare).
The 512-bit vectors in the test file are scaled up by a factor of 4 from the legal vector width costs.
The plain latency cost-kind is not affected in this patch because that calc is diverted before we get to getIntrinsicInstrCost().

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.Oct 15 2020, 6:35 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 15 2020, 6:35 AM

Herald added subscribers: pengfei, mcrosier. · View Herald Transcript

spatel requested review of this revision.Oct 15 2020, 6:35 AM

I'm sorry for not finishing what I started with this intrinsic nonsense... Any simplification of these winding paths sounds great to me.

This revision is now accepted and ready to land.Oct 15 2020, 7:00 AM

In D89461#2332196, @samparker wrote:

I'm sorry for not finishing what I started with this intrinsic nonsense... Any simplification of these winding paths sounds great to me.

I definitely appreciate what you accomplished! And sorry for not being quicker to review your patches. Intrinsic handling is truly awful...

spatel mentioned this in D89479: [SimplifyCFG] Be more conservative when speculating in loops. (WIP).Oct 15 2020, 9:20 AM

Closed by commit rG9f6048f83dc2: [CostModel] remove cost-kind predicate for ctlz/cttz intrinsics in basic TTI… (authored by spatel). · Explain WhyOct 15 2020, 10:19 AM

This revision was automatically updated to reflect the committed changes.

spatel added a commit: rG9f6048f83dc2: [CostModel] remove cost-kind predicate for ctlz/cttz intrinsics in basic TTI….

spatel mentioned this in D90554: [CostModel] remove cost-kind predicate for intrinsics in basic TTI implementation.Nov 1 2020, 5:45 AM

spatel mentioned this in rGf7eac51b9b3f: [CostModel] remove cost-kind predicate for intrinsics in basic TTI….Nov 10 2020, 5:25 AM

spatel mentioned this in rGe32bd3512043: [CostModel] mostly remove cost-kind predicate for intrinsics in basic TTI….Nov 20 2020, 8:37 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

BasicTTIImpl.h

18 lines

test/

Analysis/

CostModel/

X86/

intrinsic-cost-kinds.ll

16 lines

Diff 298415

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 1,144 Lines • ▼ Show 20 Lines	unsigned getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
switch (IID) {		switch (IID) {
default:		default:
// FIXME: all cost kinds should default to the same thing?		// FIXME: all cost kinds should default to the same thing?
if (CostKind != TTI::TCK_RecipThroughput)		if (CostKind != TTI::TCK_RecipThroughput)
return BaseT::getIntrinsicInstrCost(ICA, CostKind);		return BaseT::getIntrinsicInstrCost(ICA, CostKind);
break;		break;

case Intrinsic::cttz:		case Intrinsic::cttz:
// FIXME: all cost kinds should default to the same thing?		// FIXME: If necessary, this should go in target-specific overrides.
if (CostKind != TTI::TCK_RecipThroughput) {		if (VF == 1 && RetVF == 1 && getTLI()->isCheapToSpeculateCttz())
if (getTLI()->isCheapToSpeculateCttz())
return TargetTransformInfo::TCC_Basic;		return TargetTransformInfo::TCC_Basic;
return BaseT::getIntrinsicInstrCost(ICA, CostKind);
}
break;		break;

case Intrinsic::ctlz:		case Intrinsic::ctlz:
// FIXME: all cost kinds should default to the same thing?		// FIXME: If necessary, this should go in target-specific overrides.
if (CostKind != TTI::TCK_RecipThroughput) {		if (VF == 1 && RetVF == 1 && getTLI()->isCheapToSpeculateCtlz())
if (getTLI()->isCheapToSpeculateCtlz())
return TargetTransformInfo::TCC_Basic;		return TargetTransformInfo::TCC_Basic;
return BaseT::getIntrinsicInstrCost(ICA, CostKind);
}
break;		break;

case Intrinsic::memcpy:		case Intrinsic::memcpy:
// FIXME: all cost kinds should default to the same thing?		// FIXME: all cost kinds should default to the same thing?
if (CostKind != TTI::TCK_RecipThroughput)		if (CostKind != TTI::TCK_RecipThroughput)
return thisT()->getMemcpyCost(ICA.getInst());		return thisT()->getMemcpyCost(ICA.getInst());
return BaseT::getIntrinsicInstrCost(ICA, CostKind);		return BaseT::getIntrinsicInstrCost(ICA, CostKind);

▲ Show 20 Lines • Show All 748 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/intrinsic-cost-kinds.ll

	Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	; THRU-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; THRU-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; LATE-LABEL: 'cttz'			; LATE-LABEL: 'cttz'
	; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s = call i32 @llvm.cttz.i32(i32 %a, i1 false)			; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s = call i32 @llvm.cttz.i32(i32 %a, i1 false)
	; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v = call <16 x i32> @llvm.cttz.v16i32(<16 x i32> %va, i1 false)			; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v = call <16 x i32> @llvm.cttz.v16i32(<16 x i32> %va, i1 false)
	; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void			; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
	;			;
	; SIZE-LABEL: 'cttz'			; SIZE-LABEL: 'cttz'
	; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s = call i32 @llvm.cttz.i32(i32 %a, i1 false)			; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %s = call i32 @llvm.cttz.i32(i32 %a, i1 false)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v = call <16 x i32> @llvm.cttz.v16i32(<16 x i32> %va, i1 false)			; SIZE-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %v = call <16 x i32> @llvm.cttz.v16i32(<16 x i32> %va, i1 false)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void			; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
	;			;
	; SIZE_LATE-LABEL: 'cttz'			; SIZE_LATE-LABEL: 'cttz'
	; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s = call i32 @llvm.cttz.i32(i32 %a, i1 false)			; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %s = call i32 @llvm.cttz.i32(i32 %a, i1 false)
	; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v = call <16 x i32> @llvm.cttz.v16i32(<16 x i32> %va, i1 false)			; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %v = call <16 x i32> @llvm.cttz.v16i32(<16 x i32> %va, i1 false)
	; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void			; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
	;			;
	%s = call i32 @llvm.cttz.i32(i32 %a, i1 false)			%s = call i32 @llvm.cttz.i32(i32 %a, i1 false)
	%v = call <16 x i32> @llvm.cttz.v16i32(<16 x i32> %va, i1 false)			%v = call <16 x i32> @llvm.cttz.v16i32(<16 x i32> %va, i1 false)
	ret void			ret void
	}			}

	define void @ctlz(i32 %a, <16 x i32> %va) {			define void @ctlz(i32 %a, <16 x i32> %va) {
	; THRU-LABEL: 'ctlz'			; THRU-LABEL: 'ctlz'
	; THRU-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %s = call i32 @llvm.ctlz.i32(i32 %a, i1 true)			; THRU-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %s = call i32 @llvm.ctlz.i32(i32 %a, i1 true)
	; THRU-NEXT: Cost Model: Found an estimated cost of 104 for instruction: %v = call <16 x i32> @llvm.ctlz.v16i32(<16 x i32> %va, i1 true)			; THRU-NEXT: Cost Model: Found an estimated cost of 104 for instruction: %v = call <16 x i32> @llvm.ctlz.v16i32(<16 x i32> %va, i1 true)
	; THRU-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; THRU-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; LATE-LABEL: 'ctlz'			; LATE-LABEL: 'ctlz'
	; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s = call i32 @llvm.ctlz.i32(i32 %a, i1 true)			; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s = call i32 @llvm.ctlz.i32(i32 %a, i1 true)
	; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v = call <16 x i32> @llvm.ctlz.v16i32(<16 x i32> %va, i1 true)			; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v = call <16 x i32> @llvm.ctlz.v16i32(<16 x i32> %va, i1 true)
	; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void			; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
	;			;
	; SIZE-LABEL: 'ctlz'			; SIZE-LABEL: 'ctlz'
	; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s = call i32 @llvm.ctlz.i32(i32 %a, i1 true)			; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %s = call i32 @llvm.ctlz.i32(i32 %a, i1 true)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v = call <16 x i32> @llvm.ctlz.v16i32(<16 x i32> %va, i1 true)			; SIZE-NEXT: Cost Model: Found an estimated cost of 104 for instruction: %v = call <16 x i32> @llvm.ctlz.v16i32(<16 x i32> %va, i1 true)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void			; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
	;			;
	; SIZE_LATE-LABEL: 'ctlz'			; SIZE_LATE-LABEL: 'ctlz'
	; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s = call i32 @llvm.ctlz.i32(i32 %a, i1 true)			; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %s = call i32 @llvm.ctlz.i32(i32 %a, i1 true)
	; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v = call <16 x i32> @llvm.ctlz.v16i32(<16 x i32> %va, i1 true)			; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 104 for instruction: %v = call <16 x i32> @llvm.ctlz.v16i32(<16 x i32> %va, i1 true)
	; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void			; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
	;			;
	%s = call i32 @llvm.ctlz.i32(i32 %a, i1 true)			%s = call i32 @llvm.ctlz.i32(i32 %a, i1 true)
	%v = call <16 x i32> @llvm.ctlz.v16i32(<16 x i32> %va, i1 true)			%v = call <16 x i32> @llvm.ctlz.v16i32(<16 x i32> %va, i1 true)
	ret void			ret void
	}			}

	define void @fshl(i32 %a, i32 %b, i32 %c, <16 x i32> %va, <16 x i32> %vb, <16 x i32> %vc) {			define void @fshl(i32 %a, i32 %b, i32 %c, <16 x i32> %va, <16 x i32> %vb, <16 x i32> %vc) {
	▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines