This is an archive of the discontinued LLVM Phabricator instance.

Differential D84995

[AMDGPU][CostModel] Add f16, f64 and contract cases to fused costs estimation.
ClosedPublic

Authored by dfukalov on Jul 30 2020, 5:57 PM.

Download Raw Diff

Details

Reviewers

rampitec

Commits

rG4ccc38813eb7: [AMDGPU][CostModel] Add f16, f64 and contract cases to fused costs estimation.

Summary

Add cases of fused fmul+fadd/fsub with f16 and f64 operands to cost model.
Also added operations with contract attribute.

Fixed line endings in test.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	240 ms	windows > lld.ELF::dependency-file.s

Event Timeline

dfukalov created this revision.Jul 30 2020, 5:57 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 30 2020, 5:57 PM

Herald added subscribers: kerbowa, hiraditya, t-tye and 7 others. · View Herald Transcript

dfukalov requested review of this revision.Jul 30 2020, 5:57 PM

Herald added a subscriber: wdng. · View Herald TranscriptJul 30 2020, 5:57 PM

Harbormaster completed remote builds in B66483: Diff 282090.Jul 30 2020, 6:41 PM

arsenm added inline comments.Jul 30 2020, 8:21 PM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
514	I think we don't actually need the hasOneUse case. Only with the conditionally available 2-operand VOP2 forms is there a code size savings by not fusing
517	The !HasFP32Denormals check is low applying to all types?

dfukalov added inline comments.Jul 31 2020, 7:31 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
514	Generally speaking, fused operation cost will be almost the same as FADD/FSUB that is estimated in detail in next case of this switch. If the FMUL result is used elsewhere than FADD/FSUB that means the FMUL possible will not be eliminated by fusing.
517	Yes, since here we do not estimate FMUL cost, but estimate it will be fused and LLVM_FALLTHROUGH in other cases.

rampitec added inline comments.Jul 31 2020, 10:26 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
517	Yes, since here we do not estimate FMUL cost, but estimate it will be fused and LLVM_FALLTHROUGH in other cases. It should not affect anything except f32, and you may return Free based on fp32-denormals for any type.

Change updated with addressed comments.

rampitec added inline comments.Aug 5 2020, 3:18 PM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
517	Also need to check ST->hasMadMacF32Insts().
519	I think you do not need to check has16BitInsts(). If it does not f16 would be illegal anyway.

rampitec added inline comments.Aug 5 2020, 3:21 PM

llvm/test/Analysis/CostModel/AMDGPU/fused_costs.ll
6	Add a run line with -mcpu=gfx1030 and disabled fp32 denorm support. It does not have f32 mad/mac and shall not be contracted as a result.

Harbormaster completed remote builds in B67202: Diff 283410.Aug 5 2020, 4:14 PM

Check for hasMadMacF32Insts() added.

Harbormaster completed remote builds in B67283: Diff 283540.Aug 6 2020, 3:19 AM

dfukalov marked 2 inline comments as done.Aug 6 2020, 9:20 AM

dfukalov added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
519	I used it just as it checked in cost estimation for the corresponding FADD/FSUB (below). As I understand, targets without fp16 insts support will not fuse fmul+fadd too. So we should LLVM_FALLTHROUGH for the case.

LGTM

This revision is now accepted and ready to land.Aug 6 2020, 9:22 AM

This revision was landed with ongoing or failed builds.Aug 6 2020, 11:43 AM

Closed by commit rG4ccc38813eb7: [AMDGPU][CostModel] Add f16, f64 and contract cases to fused costs estimation. (authored by dfukalov). · Explain Why

This revision was automatically updated to reflect the committed changes.

dfukalov added a commit: rG4ccc38813eb7: [AMDGPU][CostModel] Add f16, f64 and contract cases to fused costs estimation..

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUTargetTransformInfo.h

23 lines

AMDGPUTargetTransformInfo.cpp

14 lines

test/

Analysis/

CostModel/

AMDGPU/

fused_costs.ll

205 lines

Diff 283410

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h

Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	class GCNTTIImpl final : public BasicTTIImplBase<GCNTTIImpl> {

friend BaseT;		friend BaseT;

const GCNSubtarget *ST;		const GCNSubtarget *ST;
const SITargetLowering *TLI;		const SITargetLowering *TLI;
AMDGPUTTIImpl CommonTTI;		AMDGPUTTIImpl CommonTTI;
bool IsGraphicsShader;		bool IsGraphicsShader;
bool HasFP32Denormals;		bool HasFP32Denormals;
		bool HasFP64FP16Denormals;
unsigned MaxVGPRs;		unsigned MaxVGPRs;

const FeatureBitset InlineFeatureIgnoreList = {		const FeatureBitset InlineFeatureIgnoreList = {
// Codegen control options which don't matter.		// Codegen control options which don't matter.
AMDGPU::FeatureEnableLoadStoreOpt,		AMDGPU::FeatureEnableLoadStoreOpt,
AMDGPU::FeatureEnableSIScheduler,		AMDGPU::FeatureEnableSIScheduler,
AMDGPU::FeatureEnableUnsafeDSOffsetFolding,		AMDGPU::FeatureEnableUnsafeDSOffsetFolding,
AMDGPU::FeatureFlatForGlobal,		AMDGPU::FeatureFlatForGlobal,
Show All 39 Lines	class GCNTTIImpl final : public BasicTTIImplBase<GCNTTIImpl> {
// quarter. This also applies to some integer operations.		// quarter. This also applies to some integer operations.
inline int get64BitInstrCost() const {		inline int get64BitInstrCost() const {
return ST->hasHalfRate64Ops() ?		return ST->hasHalfRate64Ops() ?
getHalfRateInstrCost() : getQuarterRateInstrCost();		getHalfRateInstrCost() : getQuarterRateInstrCost();
}		}

public:		public:
explicit GCNTTIImpl(const AMDGPUTargetMachine *TM, const Function &F)		explicit GCNTTIImpl(const AMDGPUTargetMachine *TM, const Function &F)
: BaseT(TM, F.getParent()->getDataLayout()),		: BaseT(TM, F.getParent()->getDataLayout()),
ST(static_cast<const GCNSubtarget*>(TM->getSubtargetImpl(F))),		ST(static_cast<const GCNSubtarget *>(TM->getSubtargetImpl(F))),
TLI(ST->getTargetLowering()),		TLI(ST->getTargetLowering()), CommonTTI(TM, F),
CommonTTI(TM, F),
IsGraphicsShader(AMDGPU::isShader(F.getCallingConv())),		IsGraphicsShader(AMDGPU::isShader(F.getCallingConv())),
HasFP32Denormals(AMDGPU::SIModeRegisterDefaults(F).allFP32Denormals()),
MaxVGPRs(ST->getMaxNumVGPRs(		MaxVGPRs(ST->getMaxNumVGPRs(
std::max(ST->getWavesPerEU(F).first,		std::max(ST->getWavesPerEU(F).first,
ST->getWavesPerEUForWorkGroup(		ST->getWavesPerEUForWorkGroup(
ST->getFlatWorkGroupSizes(F).second)))) {}		ST->getFlatWorkGroupSizes(F).second)))) {
		AMDGPU::SIModeRegisterDefaults Mode(F);
		HasFP32Denormals = Mode.allFP32Denormals();
		HasFP64FP16Denormals = Mode.allFP64FP16Denormals();
		}

bool hasBranchDivergence() { return true; }		bool hasBranchDivergence() { return true; }
bool useGPUDivergenceAnalysis() const;		bool useGPUDivergenceAnalysis() const;

void getUnrollingPreferences(Loop *L, ScalarEvolution &SE,		void getUnrollingPreferences(Loop *L, ScalarEvolution &SE,
TTI::UnrollingPreferences &UP);		TTI::UnrollingPreferences &UP);

void getPeelingPreferences(Loop *L, ScalarEvolution &SE,		void getPeelingPreferences(Loop *L, ScalarEvolution &SE,
▲ Show 20 Lines • Show All 149 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

Show First 20 Lines • Show All 504 Lines • ▼ Show 20 Lines	case ISD::MUL: {

// i32		// i32
return QuarterRateCost * NElts * LT.first;		return QuarterRateCost * NElts * LT.first;
}		}
case ISD::FMUL:		case ISD::FMUL:
// Check possible fuse {fadd\|fsub}(a,fmul(b,c)) and return zero cost for		// Check possible fuse {fadd\|fsub}(a,fmul(b,c)) and return zero cost for
// fmul(b,c) supposing the fadd\|fsub will get estimated cost for the whole		// fmul(b,c) supposing the fadd\|fsub will get estimated cost for the whole
// fused operation.		// fused operation.
if (!HasFP32Denormals && SLT == MVT::f32 && CxtI && CxtI->hasOneUse())		if (CxtI && CxtI->hasOneUse())
if (const auto FAdd = dyn_cast<BinaryOperator>(CxtI->user_begin())) {		if (const auto FAdd = dyn_cast<BinaryOperator>(CxtI->user_begin())) {
		arsenmUnsubmitted Not Done Reply Inline Actions I think we don't actually need the hasOneUse case. Only with the conditionally available 2-operand VOP2 forms is there a code size savings by not fusing arsenm: I think we don't actually need the hasOneUse case. Only with the conditionally available 2…
		dfukalovAuthorUnsubmitted Done Reply Inline Actions Generally speaking, fused operation cost will be almost the same as FADD/FSUB that is estimated in detail in next case of this switch. If the FMUL result is used elsewhere than FADD/FSUB that means the FMUL possible will not be eliminated by fusing. dfukalov: Generally speaking, fused operation cost will be almost the same as FADD/FSUB that is estimated…
const int OPC = TLI->InstructionOpcodeToISD(FAdd->getOpcode());		const int OPC = TLI->InstructionOpcodeToISD(FAdd->getOpcode());
if (OPC == ISD::FADD \|\| OPC == ISD::FSUB) {		if (OPC == ISD::FADD \|\| OPC == ISD::FSUB) {
		if (SLT == MVT::f32 && !HasFP32Denormals)
		arsenmUnsubmitted Not Done Reply Inline Actions The !HasFP32Denormals check is low applying to all types? arsenm: The !HasFP32Denormals check is low applying to all types?
		dfukalovAuthorUnsubmitted Done Reply Inline Actions Yes, since here we do not estimate FMUL cost, but estimate it will be fused and LLVM_FALLTHROUGH in other cases. dfukalov: Yes, since here we do not estimate FMUL cost, but estimate it will be fused and…
		rampitecUnsubmitted Not Done Reply Inline Actions Yes, since here we do not estimate FMUL cost, but estimate it will be fused and LLVM_FALLTHROUGH in other cases. It should not affect anything except f32, and you may return Free based on fp32-denormals for any type. rampitec: > Yes, since here we do not estimate FMUL cost, but estimate it will be fused and…
		rampitecUnsubmitted Done Reply Inline Actions Also need to check ST->hasMadMacF32Insts(). rampitec: Also need to check ST->hasMadMacF32Insts().
		return TargetTransformInfo::TCC_Free;
		if (ST->has16BitInsts() && SLT == MVT::f16 && !HasFP64FP16Denormals)
		rampitecUnsubmitted Not Done Reply Inline Actions I think you do not need to check has16BitInsts(). If it does not f16 would be illegal anyway. rampitec: I think you do not need to check has16BitInsts(). If it does not f16 would be illegal anyway.
		dfukalovAuthorUnsubmitted Done Reply Inline Actions I used it just as it checked in cost estimation for the corresponding FADD/FSUB (below). As I understand, targets without fp16 insts support will not fuse fmul+fadd too. So we should LLVM_FALLTHROUGH for the case. dfukalov: I used it just as it checked in cost estimation for the corresponding FADD/FSUB (below). As I…
		return TargetTransformInfo::TCC_Free;

		// Estimate all types may be fused with contract/unsafe flags
		const TargetOptions &Options = TLI->getTargetMachine().Options;
		if (Options.AllowFPOpFusion == FPOpFusion::Fast \|\|
		Options.UnsafeFPMath \|\|
		(FAdd->hasAllowContract() && CxtI->hasAllowContract()))
return TargetTransformInfo::TCC_Free;		return TargetTransformInfo::TCC_Free;
}		}
}		}
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
case ISD::FADD:		case ISD::FADD:
case ISD::FSUB:		case ISD::FSUB:
if (SLT == MVT::f64)		if (SLT == MVT::f64)
return LT.first * NElts * get64BitInstrCost();		return LT.first * NElts * get64BitInstrCost();

▲ Show 20 Lines • Show All 607 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AMDGPU/fused_costs.ll

	; RUN: opt -cost-model -analyze -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx900 -denormal-fp-math-f32=preserve-sign < %s \| FileCheck -check-prefixes=FUSED,ALL %s			; RUN: opt -cost-model -analyze -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx900 -denormal-fp-math-f32=preserve-sign -denormal-fp-math=preserve-sign -fp-contract=on < %s \| FileCheck -check-prefixes=FUSED,NOCONTRACT,ALL %s
	; RUN: opt -cost-model -analyze -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx900 -denormal-fp-math-f32=ieee < %s \| FileCheck -check-prefixes=SLOW,ALL %s			; RUN: opt -cost-model -analyze -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx900 -denormal-fp-math-f32=ieee -denormal-fp-math=ieee -fp-contract=on < %s \| FileCheck -check-prefixes=SLOW,NOCONTRACT,ALL %s
	; RUN: opt -cost-model -cost-kind=code-size -analyze -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx900 -denormal-fp-math-f32=preserve-sign < %s \| FileCheck -check-prefixes=FUSED,ALL %s			; RUN: opt -cost-model -analyze -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx900 -denormal-fp-math-f32=ieee -denormal-fp-math=ieee -fp-contract=fast < %s \| FileCheck -check-prefixes=FUSED,CONTRACT,ALL %s
	; RUN: opt -cost-model -cost-kind=code-size -analyze -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx900 -denormal-fp-math-f32=ieee < %s \| FileCheck -check-prefixes=SLOW,ALL %s			; RUN: opt -cost-model -cost-kind=code-size -analyze -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx900 -denormal-fp-math-f32=preserve-sign -denormal-fp-math=preserve-sign -fp-contract=on < %s \| FileCheck -check-prefixes=FUSED32,FUSED16,NOCONTRACT,ALL %s
				; RUN: opt -cost-model -cost-kind=code-size -analyze -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx900 -denormal-fp-math-f32=ieee -denormal-fp-math=ieee -fp-contract=on < %s \| FileCheck -check-prefixes=SLOW,NOCONTRACT,ALL %s
	target triple = "amdgcn--"			; RUN: opt -cost-model -cost-kind=code-size -analyze -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx900 -denormal-fp-math-f32=ieee -denormal-fp-math=ieee -fp-contract=fast < %s \| FileCheck -check-prefixes=FUSED32,FUSED16,CONTRACT,ALL %s
				rampitecUnsubmitted Done Reply Inline Actions Add a run line with -mcpu=gfx1030 and disabled fp32 denorm support. It does not have f32 mad/mac and shall not be contracted as a result. rampitec: Add a run line with -mcpu=gfx1030 and disabled fp32 denorm support. It does not have f32…

	; ALL-LABEL: 'fmul_fadd_f32':			target triple = "amdgcn--"
	; FUSED: estimated cost of 0 for instruction: %mul = fmul float
	; SLOW: estimated cost of 1 for instruction: %mul = fmul float			; ALL-LABEL: 'fmul_fadd_f32':
	; ALL: estimated cost of 1 for instruction: %add = fadd float			; FUSED: estimated cost of 0 for instruction: %mul = fmul float
	define float @fmul_fadd_f32(float %r0, float %r1, float %r2) #0 {			; SLOW: estimated cost of 1 for instruction: %mul = fmul float
	%mul = fmul float %r0, %r1			; ALL: estimated cost of 1 for instruction: %add = fadd float
	%add = fadd float %mul, %r2			define float @fmul_fadd_f32(float %r0, float %r1, float %r2) #0 {
	ret float %add			%mul = fmul float %r0, %r1
	}			%add = fadd float %mul, %r2
				ret float %add
	; ALL-LABEL: 'fmul_fadd_v2f32':			}
	; FUSED: estimated cost of 0 for instruction: %mul = fmul <2 x float>
	; SLOW: estimated cost of 2 for instruction: %mul = fmul <2 x float>			; ALL-LABEL: 'fmul_fadd_contract_f32':
	; ALL: estimated cost of 2 for instruction: %add = fadd <2 x float>			; ALL: estimated cost of 0 for instruction: %mul = fmul contract float
	define <2 x float> @fmul_fadd_v2f32(<2 x float> %r0, <2 x float> %r1, <2 x float> %r2) #0 {			; ALL: estimated cost of 1 for instruction: %add = fadd contract float
	%mul = fmul <2 x float> %r0, %r1			define float @fmul_fadd_contract_f32(float %r0, float %r1, float %r2) #0 {
	%add = fadd <2 x float> %mul, %r2			%mul = fmul contract float %r0, %r1
	ret <2 x float> %add			%add = fadd contract float %mul, %r2
	}			ret float %add
				}
	; ALL-LABEL: 'fmul_fsub_f32':
	; FUSED: estimated cost of 0 for instruction: %mul = fmul float			; ALL-LABEL: 'fmul_fadd_v2f32':
	; SLOW: estimated cost of 1 for instruction: %mul = fmul float			; FUSED: estimated cost of 0 for instruction: %mul = fmul <2 x float>
	; ALL: estimated cost of 1 for instruction: %sub = fsub float			; SLOW: estimated cost of 2 for instruction: %mul = fmul <2 x float>
	define float @fmul_fsub_f32(float %r0, float %r1, float %r2) #0 {			; ALL: estimated cost of 2 for instruction: %add = fadd <2 x float>
	%mul = fmul float %r0, %r1			define <2 x float> @fmul_fadd_v2f32(<2 x float> %r0, <2 x float> %r1, <2 x float> %r2) #0 {
	%sub = fsub float %mul, %r2			%mul = fmul <2 x float> %r0, %r1
	ret float %sub			%add = fadd <2 x float> %mul, %r2
	}			ret <2 x float> %add
				}
	; ALL-LABEL: 'fmul_fsub_v2f32':
	; FUSED: estimated cost of 0 for instruction: %mul = fmul <2 x float>			; ALL-LABEL: 'fmul_fsub_f32':
	; SLOW: estimated cost of 2 for instruction: %mul = fmul <2 x float>			; FUSED: estimated cost of 0 for instruction: %mul = fmul float
	; ALL: estimated cost of 2 for instruction: %sub = fsub <2 x float>			; SLOW: estimated cost of 1 for instruction: %mul = fmul float
	define <2 x float> @fmul_fsub_v2f32(<2 x float> %r0, <2 x float> %r1, <2 x float> %r2) #0 {			; ALL: estimated cost of 1 for instruction: %sub = fsub float
	%mul = fmul <2 x float> %r0, %r1			define float @fmul_fsub_f32(float %r0, float %r1, float %r2) #0 {
	%sub = fsub <2 x float> %mul, %r2			%mul = fmul float %r0, %r1
	ret <2 x float> %sub			%sub = fsub float %mul, %r2
	}			ret float %sub
				}
	attributes #0 = { nounwind }
				; ALL-LABEL: 'fmul_fsub_v2f32':
				; FUSED: estimated cost of 0 for instruction: %mul = fmul <2 x float>
				; SLOW: estimated cost of 2 for instruction: %mul = fmul <2 x float>
				; ALL: estimated cost of 2 for instruction: %sub = fsub <2 x float>
				define <2 x float> @fmul_fsub_v2f32(<2 x float> %r0, <2 x float> %r1, <2 x float> %r2) #0 {
				%mul = fmul <2 x float> %r0, %r1
				%sub = fsub <2 x float> %mul, %r2
				ret <2 x float> %sub
				}

				; ALL-LABEL: 'fmul_fadd_f16':
				; FUSED: estimated cost of 0 for instruction: %mul = fmul half
				; SLOW: estimated cost of 1 for instruction: %mul = fmul half
				; ALL: estimated cost of 1 for instruction: %add = fadd half
				define half @fmul_fadd_f16(half %r0, half %r1, half %r2) #0 {
				%mul = fmul half %r0, %r1
				%add = fadd half %mul, %r2
				ret half %add
				}

				; ALL-LABEL: 'fmul_fadd_contract_f16':
				; ALL: estimated cost of 0 for instruction: %mul = fmul contract half
				; ALL: estimated cost of 1 for instruction: %add = fadd contract half
				define half @fmul_fadd_contract_f16(half %r0, half %r1, half %r2) #0 {
				%mul = fmul contract half %r0, %r1
				%add = fadd contract half %mul, %r2
				ret half %add
				}

				; ALL-LABEL: 'fmul_fadd_v2f16':
				; FUSED: estimated cost of 0 for instruction: %mul = fmul <2 x half>
				; SLOW: estimated cost of 1 for instruction: %mul = fmul <2 x half>
				; ALL: estimated cost of 1 for instruction: %add = fadd <2 x half>
				define <2 x half> @fmul_fadd_v2f16(<2 x half> %r0, <2 x half> %r1, <2 x half> %r2) #0 {
				%mul = fmul <2 x half> %r0, %r1
				%add = fadd <2 x half> %mul, %r2
				ret <2 x half> %add
				}

				; ALL-LABEL: 'fmul_fsub_f16':
				; FUSED: estimated cost of 0 for instruction: %mul = fmul half
				; SLOW: estimated cost of 1 for instruction: %mul = fmul half
				; ALL: estimated cost of 1 for instruction: %sub = fsub half
				define half @fmul_fsub_f16(half %r0, half %r1, half %r2) #0 {
				%mul = fmul half %r0, %r1
				%sub = fsub half %mul, %r2
				ret half %sub
				}

				; ALL-LABEL: 'fmul_fsub_v2f16':
				; FUSED: estimated cost of 0 for instruction: %mul = fmul <2 x half>
				; SLOW: estimated cost of 1 for instruction: %mul = fmul <2 x half>
				; ALL: estimated cost of 1 for instruction: %sub = fsub <2 x half>
				define <2 x half> @fmul_fsub_v2f16(<2 x half> %r0, <2 x half> %r1, <2 x half> %r2) #0 {
				%mul = fmul <2 x half> %r0, %r1
				%sub = fsub <2 x half> %mul, %r2
				ret <2 x half> %sub
				}

				; ALL-LABEL: 'fmul_fadd_f64':
				; CONTRACT: estimated cost of 0 for instruction: %mul = fmul double
				; NOCONTRACT: estimated cost of 3 for instruction: %mul = fmul double
				; ALL: estimated cost of 3 for instruction: %add = fadd double
				define double @fmul_fadd_f64(double %r0, double %r1, double %r2) #0 {
				%mul = fmul double %r0, %r1
				%add = fadd double %mul, %r2
				ret double %add
				}

				; ALL-LABEL: 'fmul_fadd_contract_f64':
				; ALL: estimated cost of 0 for instruction: %mul = fmul contract double
				; ALL: estimated cost of 3 for instruction: %add = fadd contract double
				define double @fmul_fadd_contract_f64(double %r0, double %r1, double %r2) #0 {
				%mul = fmul contract double %r0, %r1
				%add = fadd contract double %mul, %r2
				ret double %add
				}

				; ALL-LABEL: 'fmul_fadd_v2f64':
				; CONTRACT: estimated cost of 0 for instruction: %mul = fmul <2 x double>
				; NOCONTRACT: estimated cost of 6 for instruction: %mul = fmul <2 x double>
				; ALL: estimated cost of 6 for instruction: %add = fadd <2 x double>
				define <2 x double> @fmul_fadd_v2f64(<2 x double> %r0, <2 x double> %r1, <2 x double> %r2) #0 {
				%mul = fmul <2 x double> %r0, %r1
				%add = fadd <2 x double> %mul, %r2
				ret <2 x double> %add
				}

				; ALL-LABEL: 'fmul_fsub_f64':
				; CONTRACT: estimated cost of 0 for instruction: %mul = fmul double
				; NOCONTRACT: estimated cost of 3 for instruction: %mul = fmul double
				; ALL: estimated cost of 3 for instruction: %sub = fsub double
				define double @fmul_fsub_f64(double %r0, double %r1, double %r2) #0 {
				%mul = fmul double %r0, %r1
				%sub = fsub double %mul, %r2
				ret double %sub
				}

				; ALL-LABEL: 'fmul_fsub_v2f64':
				; CONTRACT: estimated cost of 0 for instruction: %mul = fmul <2 x double>
				; NOCONTRACT: estimated cost of 6 for instruction: %mul = fmul <2 x double>
				; ALL: estimated cost of 6 for instruction: %sub = fsub <2 x double>
				define <2 x double> @fmul_fsub_v2f64(<2 x double> %r0, <2 x double> %r1, <2 x double> %r2) #0 {
				%mul = fmul <2 x double> %r0, %r1
				%sub = fsub <2 x double> %mul, %r2
				ret <2 x double> %sub
				}

				attributes #0 = { nounwind }