This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Perform the fold of A - (-B) -> A + B only when it is cheaper
AbandonedPublic

Authored by steven.zhang on Feb 25 2020, 11:57 PM.

Download Raw Diff

Details

Reviewers

rampitec
RKSimon
craig.topper
spatel
dmgreen
arsenm
jsji

Group Reviewers

Restricted Project

Summary

We have the rule to fold A + B -> A - (-B) only when it is cheaper.

// fold (fadd A, (fneg B)) -> (fsub A, B)
if ((!LegalOperations || TLI.isOperationLegalOrCustom(ISD::FSUB, VT)) &&
    TLI.getNegatibleCost(N1, DAG, LegalOperations, ForCodeSize) ==
        TargetLowering::NegatibleCost::Cheaper)
  return DAG.getNode(
      ISD::FSUB, DL, VT, N0,
      TLI.getNegatedExpression(N1, DAG, LegalOperations, ForCodeSize), Flags);

But for the reverse folding(A - (-B) -> A + B), it is done as long as it is not expensive, which including that it is neutral. This patch fix this transformation that didn't have any gain.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

steven.zhang created this revision.Feb 25 2020, 11:57 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 25 2020, 11:57 PM

Herald added subscribers: llvm-commits, kerbowa, hiraditya and 4 others. · View Herald Transcript

steven.zhang edited the summary of this revision. (Show Details)Feb 25 2020, 11:59 PM

Herald added a subscriber: • wuzish. · View Herald TranscriptFeb 25 2020, 11:59 PM

Harbormaster completed remote builds in B47273: Diff 246636.Feb 26 2020, 1:00 AM

The test changes don't immediately stand out to me as improvements.
I'd think the fix should go in other direction.

In D75157#1892866, @lebedev.ri wrote:

The test changes don't immediately stand out to me as improvements.
I'd think the fix should go in other direction.

Yes, no improvement as we know that it is neutral. And I don't see the improvement case also from the test change if we do this folding. Regarding to we only do the reverse folding when it is cheaper, I want to make them consistent to avoid the confusion no matter which direction. It is confusing that sometimes we do the folding if it is neutral and sometime not. The current implementation imply that, we prefer the "ADD" to "SUB". I am not sure that it is by the design as I don't see the reason for that. Welcome for any comments.

RKSimon added inline comments.Feb 26 2020, 2:47 AM

llvm/test/CodeGen/X86/dag-fmf-cse.ll
14	Annoying - any chance you can investigate?

In D75157#1892994, @steven.zhang wrote:

In D75157#1892866, @lebedev.ri wrote:

The test changes don't immediately stand out to me as improvements.
I'd think the fix should go in other direction.

Yes, no improvement as we know that it is neutral. And I don't see the improvement case also from the test change if we do this folding.

Regarding to we only do the reverse folding when it is cheaper,
I want to make them consistent to avoid the confusion no matter which direction.

That was my suggestion, yes. Change the other fold to be consistent with this one.

It is confusing that sometimes we do the folding if it is neutral and sometime not. The current implementation imply that, we prefer the "ADD" to "SUB".

That does sound sane to me, we certainly do have such a preference at least for integers, at least in middle-end.

I am not sure that it is by the design as I don't see the reason for that. Welcome for any comments.

Loss of commutativity by going from fadd to fsub will likely cause register pressure regressions.

In D75157#1892994, @steven.zhang wrote:

In D75157#1892866, @lebedev.ri wrote:

The test changes don't immediately stand out to me as improvements.
I'd think the fix should go in other direction.

I want to make them consistent to avoid the confusion no matter which direction.

In D75157#1893099, @RKSimon wrote:

Loss of commutativity by going from fadd to fsub will likely cause register pressure regressions.

I'm not very active wrt fp side of things, but i would almost think we need something like this instead

// unfold (fsub A, B) -> (fadd A, (fneg B))
if ((!LegalOperations || TLI.isOperationLegalOrCustom(ISD::FADD, VT)) &&
    TLI.getNegatibleCost(N1, DAG, LegalOperations, ForCodeSize) !=
        TargetLowering::NegatibleCost::Expensive)
  return DAG.getNode(
      ISD::FADD, DL, VT, N0,
      TLI.getNegatedExpression(N1, DAG, LegalOperations, ForCodeSize), Flags);

arsenm added inline comments.Feb 26 2020, 7:16 AM

llvm/test/CodeGen/AMDGPU/fmuladd.f16.ll
198	This is a regression. The negate should have been pulled out since it folds into the user for smaller code size

In D75157#1893398, @lebedev.ri wrote:
In D75157#1892994, @steven.zhang wrote:

In D75157#1892866, @lebedev.ri wrote:

The test changes don't immediately stand out to me as improvements.
I'd think the fix should go in other direction.

I want to make them consistent to avoid the confusion no matter which direction.

In D75157#1893099, @RKSimon wrote:

Loss of commutativity by going from fadd to fsub will likely cause register pressure regressions.

I'm not very active wrt fp side of things, but i would almost think we need something like this instead
// unfold (fsub A, B) -> (fadd A, (fneg B))
if ((!LegalOperations || TLI.isOperationLegalOrCustom(ISD::FADD, VT)) &&
    TLI.getNegatibleCost(N1, DAG, LegalOperations, ForCodeSize) !=
        TargetLowering::NegatibleCost::Expensive)
  return DAG.getNode(
      ISD::FADD, DL, VT, N0,
      TLI.getNegatedExpression(N1, DAG, LegalOperations, ForCodeSize), Flags);

Right, and we do that already.
Then i'm not sure there is anything to fix here..

Seems that we indeed see some regression and 'add' has some benefit on the register pressure over sub as it is commutable. I will abandon this revision, Thank you for all the comments.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

8 lines

test/

CodeGen/

AMDGPU/

8 lines

6 lines

8 lines

14 lines

4 lines

6 lines

PowerPC/

qpx-recipest.ll

16 lines

X86/

9 lines

36 lines

56 lines

7 lines

2 lines

36 lines

load-scalar-as-vector.ll

4 lines

negative-sin.ll

3 lines

pr44749.ll

2 lines

vec_ss_load_fold.ll

20 lines

Diff 246636

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 12,424 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFSUB(SDNode *N) {
// (fsub -0.0, N1) -> -N1		// (fsub -0.0, N1) -> -N1
// NOTE: It is safe to transform an FSUB(-0.0,X) into an FNEG(X), since the		// NOTE: It is safe to transform an FSUB(-0.0,X) into an FNEG(X), since the
// FSUB does not specify the sign bit of a NaN. Also note that for		// FSUB does not specify the sign bit of a NaN. Also note that for
// the same reason, the inverse transform is not safe, unless fast math		// the same reason, the inverse transform is not safe, unless fast math
// flags are in play.		// flags are in play.
if (N0CFP && N0CFP->isZero()) {		if (N0CFP && N0CFP->isZero()) {
if (N0CFP->isNegative() \|\|		if (N0CFP->isNegative() \|\|
(Options.NoSignedZerosFPMath \|\| Flags.hasNoSignedZeros())) {		(Options.NoSignedZerosFPMath \|\| Flags.hasNoSignedZeros())) {
if (TLI.getNegatibleCost(N1, DAG, LegalOperations, ForCodeSize) !=		if (TLI.getNegatibleCost(N1, DAG, LegalOperations, ForCodeSize) ==
TargetLowering::NegatibleCost::Expensive)		TargetLowering::NegatibleCost::Cheaper)
return TLI.getNegatedExpression(N1, DAG, LegalOperations, ForCodeSize);		return TLI.getNegatedExpression(N1, DAG, LegalOperations, ForCodeSize);
if (!LegalOperations \|\| TLI.isOperationLegal(ISD::FNEG, VT))		if (!LegalOperations \|\| TLI.isOperationLegal(ISD::FNEG, VT))
return DAG.getNode(ISD::FNEG, DL, VT, N1, Flags);		return DAG.getNode(ISD::FNEG, DL, VT, N1, Flags);
}		}
}		}

if (((Options.UnsafeFPMath && Options.NoSignedZerosFPMath) \|\|		if (((Options.UnsafeFPMath && Options.NoSignedZerosFPMath) \|\|
(Flags.hasAllowReassociation() && Flags.hasNoSignedZeros())) &&		(Flags.hasAllowReassociation() && Flags.hasNoSignedZeros())) &&
N1.getOpcode() == ISD::FADD) {		N1.getOpcode() == ISD::FADD) {
// X - (X + Y) -> -Y		// X - (X + Y) -> -Y
if (N0 == N1->getOperand(0))		if (N0 == N1->getOperand(0))
return DAG.getNode(ISD::FNEG, DL, VT, N1->getOperand(1), Flags);		return DAG.getNode(ISD::FNEG, DL, VT, N1->getOperand(1), Flags);
// X - (Y + X) -> -Y		// X - (Y + X) -> -Y
if (N0 == N1->getOperand(1))		if (N0 == N1->getOperand(1))
return DAG.getNode(ISD::FNEG, DL, VT, N1->getOperand(0), Flags);		return DAG.getNode(ISD::FNEG, DL, VT, N1->getOperand(0), Flags);
}		}

// fold (fsub A, (fneg B)) -> (fadd A, B)		// fold (fsub A, (fneg B)) -> (fadd A, B)
if (TLI.getNegatibleCost(N1, DAG, LegalOperations, ForCodeSize) !=		if (TLI.getNegatibleCost(N1, DAG, LegalOperations, ForCodeSize) ==
TargetLowering::NegatibleCost::Expensive)		TargetLowering::NegatibleCost::Cheaper)
return DAG.getNode(		return DAG.getNode(
ISD::FADD, DL, VT, N0,		ISD::FADD, DL, VT, N0,
TLI.getNegatedExpression(N1, DAG, LegalOperations, ForCodeSize), Flags);		TLI.getNegatedExpression(N1, DAG, LegalOperations, ForCodeSize), Flags);

// FSUB -> FMA combines:		// FSUB -> FMA combines:
if (SDValue Fused = visitFSUBForFMACombine(N)) {		if (SDValue Fused = visitFSUBForFMACombine(N)) {
AddToWorklist(Fused.getNode());		AddToWorklist(Fused.getNode());
return Fused;		return Fused;
▲ Show 20 Lines • Show All 8,977 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/fma-combine.ll

Show First 20 Lines • Show All 511 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @test_f32_mul_y_sub_negone_x(float addrspace(1)* %out,
%y = load float, float addrspace(1)* %in2		%y = load float, float addrspace(1)* %in2
%s = fsub float -1.0, %x		%s = fsub float -1.0, %x
%m = fmul float %y, %s		%m = fmul float %y, %s
store float %m, float addrspace(1)* %out		store float %m, float addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}test_f32_mul_sub_x_one_y:		; FUNC-LABEL: {{^}}test_f32_mul_sub_x_one_y:
; SI-NOFMA: v_add_f32_e32 [[VS:v[0-9]]], -1.0, [[VX:v[0-9]]]		; SI-NOFMA: v_subrev_f32_e32 [[VS:v[0-9]]], 1.0, [[VX:v[0-9]]]
; SI-NOFMA: v_mul_f32_e32 {{v[0-9]}}, [[VS]], [[VY:v[0-9]]]		; SI-NOFMA: v_mul_f32_e32 {{v[0-9]}}, [[VS]], [[VY:v[0-9]]]
;		;
; SI-FMA: v_fma_f32 {{v[0-9]}}, [[VX:v[0-9]]], [[VY:v[0-9]]], -[[VY:v[0-9]]]		; SI-FMA: v_fma_f32 {{v[0-9]}}, [[VX:v[0-9]]], [[VY:v[0-9]]], -[[VY:v[0-9]]]
define amdgpu_kernel void @test_f32_mul_sub_x_one_y(float addrspace(1)* %out,		define amdgpu_kernel void @test_f32_mul_sub_x_one_y(float addrspace(1)* %out,
float addrspace(1)* %in1,		float addrspace(1)* %in1,
float addrspace(1)* %in2) {		float addrspace(1)* %in2) {
%x = load float, float addrspace(1)* %in1		%x = load float, float addrspace(1)* %in1
%y = load float, float addrspace(1)* %in2		%y = load float, float addrspace(1)* %in2
%s = fsub float %x, 1.0		%s = fsub float %x, 1.0
%m = fmul float %s, %y		%m = fmul float %s, %y
store float %m, float addrspace(1)* %out		store float %m, float addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}test_f32_mul_y_sub_x_one:		; FUNC-LABEL: {{^}}test_f32_mul_y_sub_x_one:
; SI-NOFMA: v_add_f32_e32 [[VS:v[0-9]]], -1.0, [[VX:v[0-9]]]		; SI-NOFMA: v_subrev_f32_e32 [[VS:v[0-9]]], 1.0, [[VX:v[0-9]]]
; SI-NOFMA: v_mul_f32_e32 {{v[0-9]}}, [[VY:v[0-9]]], [[VS]]		; SI-NOFMA: v_mul_f32_e32 {{v[0-9]}}, [[VY:v[0-9]]], [[VS]]
;		;
; SI-FMA: v_fma_f32 {{v[0-9]}}, [[VX:v[0-9]]], [[VY:v[0-9]]], -[[VY:v[0-9]]]		; SI-FMA: v_fma_f32 {{v[0-9]}}, [[VX:v[0-9]]], [[VY:v[0-9]]], -[[VY:v[0-9]]]
define amdgpu_kernel void @test_f32_mul_y_sub_x_one(float addrspace(1)* %out,		define amdgpu_kernel void @test_f32_mul_y_sub_x_one(float addrspace(1)* %out,
float addrspace(1)* %in1,		float addrspace(1)* %in1,
float addrspace(1)* %in2) {		float addrspace(1)* %in2) {
%x = load float, float addrspace(1)* %in1		%x = load float, float addrspace(1)* %in1
%y = load float, float addrspace(1)* %in2		%y = load float, float addrspace(1)* %in2
%s = fsub float %x, 1.0		%s = fsub float %x, 1.0
%m = fmul float %y, %s		%m = fmul float %y, %s
store float %m, float addrspace(1)* %out		store float %m, float addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}test_f32_mul_sub_x_negone_y:		; FUNC-LABEL: {{^}}test_f32_mul_sub_x_negone_y:
; SI-NOFMA: v_add_f32_e32 [[VS:v[0-9]]], 1.0, [[VX:v[0-9]]]		; SI-NOFMA: v_subrev_f32_e32 [[VS:v[0-9]]], -1.0, [[VX:v[0-9]]]
; SI-NOFMA: v_mul_f32_e32 {{v[0-9]}}, [[VS]], [[VY:v[0-9]]]		; SI-NOFMA: v_mul_f32_e32 {{v[0-9]}}, [[VS]], [[VY:v[0-9]]]
;		;
; SI-FMA: v_fma_f32 {{v[0-9]}}, [[VX:v[0-9]]], [[VY:v[0-9]]], [[VY:v[0-9]]]		; SI-FMA: v_fma_f32 {{v[0-9]}}, [[VX:v[0-9]]], [[VY:v[0-9]]], [[VY:v[0-9]]]
define amdgpu_kernel void @test_f32_mul_sub_x_negone_y(float addrspace(1)* %out,		define amdgpu_kernel void @test_f32_mul_sub_x_negone_y(float addrspace(1)* %out,
float addrspace(1)* %in1,		float addrspace(1)* %in1,
float addrspace(1)* %in2) {		float addrspace(1)* %in2) {
%x = load float, float addrspace(1)* %in1		%x = load float, float addrspace(1)* %in1
%y = load float, float addrspace(1)* %in2		%y = load float, float addrspace(1)* %in2
%s = fsub float %x, -1.0		%s = fsub float %x, -1.0
%m = fmul float %s, %y		%m = fmul float %s, %y
store float %m, float addrspace(1)* %out		store float %m, float addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}test_f32_mul_y_sub_x_negone:		; FUNC-LABEL: {{^}}test_f32_mul_y_sub_x_negone:
; SI-NOFMA: v_add_f32_e32 [[VS:v[0-9]]], 1.0, [[VX:v[0-9]]]		; SI-NOFMA: v_subrev_f32_e32 [[VS:v[0-9]]], -1.0, [[VX:v[0-9]]]
; SI-NOFMA: v_mul_f32_e32 {{v[0-9]}}, [[VY:v[0-9]]], [[VS]]		; SI-NOFMA: v_mul_f32_e32 {{v[0-9]}}, [[VY:v[0-9]]], [[VS]]
;		;
; SI-FMA: v_fma_f32 {{v[0-9]}}, [[VX:v[0-9]]], [[VY:v[0-9]]], [[VY:v[0-9]]]		; SI-FMA: v_fma_f32 {{v[0-9]}}, [[VX:v[0-9]]], [[VY:v[0-9]]], [[VY:v[0-9]]]
define amdgpu_kernel void @test_f32_mul_y_sub_x_negone(float addrspace(1)* %out,		define amdgpu_kernel void @test_f32_mul_y_sub_x_negone(float addrspace(1)* %out,
float addrspace(1)* %in1,		float addrspace(1)* %in1,
float addrspace(1)* %in2) {		float addrspace(1)* %in2) {
%x = load float, float addrspace(1)* %in1		%x = load float, float addrspace(1)* %in1
%y = load float, float addrspace(1)* %in2		%y = load float, float addrspace(1)* %in2
▲ Show 20 Lines • Show All 126 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/fmuladd.f16.ll

Show First 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @fmuladd_neg_2.0_a_b_f16(half addrspace(1)* %out, half addrspace(1)* %in) #0 {
%r3 = tail call half @llvm.fmuladd.f16(half -2.0, half %r1, half %r2)		%r3 = tail call half @llvm.fmuladd.f16(half -2.0, half %r1, half %r2)
store half %r3, half addrspace(1)* %gep.out		store half %r3, half addrspace(1)* %gep.out
ret void		ret void
}		}

; GCN-LABEL: {{^}}fmuladd_neg_2.0_neg_a_b_f16		; GCN-LABEL: {{^}}fmuladd_neg_2.0_neg_a_b_f16
; GCN: {{buffer\|flat\|global}}_load_ushort [[R1:v[0-9]+]],		; GCN: {{buffer\|flat\|global}}_load_ushort [[R1:v[0-9]+]],
; GCN: {{buffer\|flat\|global}}_load_ushort [[R2:v[0-9]+]],		; GCN: {{buffer\|flat\|global}}_load_ushort [[R2:v[0-9]+]],
; VI-FLUSH: v_mac_f16_e32 [[R2]], 2.0, [[R1]]		; VI-FLUSH: v_mad_f16 [[R2]], -[[R1]], -2.0, [[R2]]
		arsenmUnsubmitted Not Done Reply Inline Actions This is a regression. The negate should have been pulled out since it folds into the user for smaller code size arsenm: This is a regression. The negate should have been pulled out since it folds into the user for…
; VI-FLUSH: flat_store_short v{{\[[0-9]+:[0-9]+\]}}, [[R2]]		; VI-FLUSH: flat_store_short v{{\[[0-9]+:[0-9]+\]}}, [[R2]]

; VI-DENORM: v_fma_f16 [[RESULT:v[0-9]+]], [[R1]], 2.0, [[R2]]		; VI-DENORM: v_fma_f16 [[RESULT:v[0-9]+]], [[R1]], 2.0, [[R2]]
; VI-DENORM: flat_store_short v{{\[[0-9]+:[0-9]+\]}}, [[RESULT]]		; VI-DENORM: flat_store_short v{{\[[0-9]+:[0-9]+\]}}, [[RESULT]]

; GFX10-FLUSH: v_add_f16_e32 [[MUL2:v[0-9]+]], [[R1]], [[R1]]		; GFX10-FLUSH: v_mul_f16_e32 [[MUL2:v[0-9]+]], -2.0, [[R1]]
; GFX10-FLUSH: v_add_f16_e32 [[RESULT:v[0-9]+]], [[R2]], [[MUL2]]		; GFX10-FLUSH: v_sub_f16_e32 [[RESULT:v[0-9]+]], [[R2]], [[MUL2]]
; GFX10-FLUSH: global_store_short v{{\[[0-9]+:[0-9]+\]}}, [[RESULT]]		; GFX10-FLUSH: global_store_short v{{\[[0-9]+:[0-9]+\]}}, [[RESULT]]

; GFX10-DENORM: v_fmac_f16_e32 [[R2]], 2.0, [[R1]]		; GFX10-DENORM: v_fmac_f16_e32 [[R2]], 2.0, [[R1]]
; GFX10-DENORM: global_store_short v{{\[[0-9]+:[0-9]+\]}}, [[R2]]		; GFX10-DENORM: global_store_short v{{\[[0-9]+:[0-9]+\]}}, [[R2]]
define amdgpu_kernel void @fmuladd_neg_2.0_neg_a_b_f16(half addrspace(1)* %out, half addrspace(1)* %in) #0 {		define amdgpu_kernel void @fmuladd_neg_2.0_neg_a_b_f16(half addrspace(1)* %out, half addrspace(1)* %in) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep.0 = getelementptr half, half addrspace(1)* %out, i32 %tid		%gep.0 = getelementptr half, half addrspace(1)* %out, i32 %tid
%gep.1 = getelementptr half, half addrspace(1)* %gep.0, i32 1		%gep.1 = getelementptr half, half addrspace(1)* %gep.0, i32 1
▲ Show 20 Lines • Show All 356 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/fmuladd.f32.ll

Show First 20 Lines • Show All 226 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @fmuladd_neg_2.0_a_b_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {
ret void		ret void
}		}

; XXX		; XXX
; GCN-LABEL: {{^}}fmuladd_neg_2.0_neg_a_b_f32		; GCN-LABEL: {{^}}fmuladd_neg_2.0_neg_a_b_f32
; GCN: {{buffer\|flat\|global}}_load_dword [[R1:v[0-9]+]],		; GCN: {{buffer\|flat\|global}}_load_dword [[R1:v[0-9]+]],
; GCN: {{buffer\|flat\|global}}_load_dword [[R2:v[0-9]+]],		; GCN: {{buffer\|flat\|global}}_load_dword [[R2:v[0-9]+]],

; GCN-FLUSH-MAD: v_mac_f32_e32 [[R2]], 2.0, [[R1]]		; GCN-FLUSH-MAD: v_mad_f32 [[R1]], -[[R1]], -2.0, [[R2]]
; GCN-FLUSH-FMAC: v_fmac_f32_e32 [[R2]], 2.0, [[R1]]		; GCN-FLUSH-FMAC: v_fmac_f32_e32 [[R2]], 2.0, [[R1]]

; SI-FLUSH: buffer_store_dword [[R2]]		; SI-FLUSH: buffer_store_dword [[R1]]
; VI-FLUSH: {{global\|flat}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[R2]]		; VI-FLUSH: {{global\|flat}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[R2]]

; GCN-DENORM-FASTFMA: v_fma_f32 [[RESULT:v[0-9]+]], [[R1]], 2.0, [[R2]]		; GCN-DENORM-FASTFMA: v_fma_f32 [[RESULT:v[0-9]+]], [[R1]], 2.0, [[R2]]

; GCN-DENORM-SLOWFMA: v_add_f32_e32 [[TMP:v[0-9]+]], [[R1]], [[R1]]		; GCN-DENORM-SLOWFMA: v_mul_f32_e32 [[TMP:v[0-9]+]], -2.0, [[R1]]
; GCN-DENORM-SLOWFMA: v_add_f32_e32 [[RESULT:v[0-9]+]], [[R2]], [[TMP]]		; GCN-DENORM-SLOWFMA: v_sub_f32_e32 [[RESULT:v[0-9]+]], [[R2]], [[TMP]]

; SI-DENORM: buffer_store_dword [[RESULT]]		; SI-DENORM: buffer_store_dword [[RESULT]]
; VI-DENORM: {{global\|flat}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[RESULT]]		; VI-DENORM: {{global\|flat}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[RESULT]]
define amdgpu_kernel void @fmuladd_neg_2.0_neg_a_b_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {		define amdgpu_kernel void @fmuladd_neg_2.0_neg_a_b_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep.0 = getelementptr float, float addrspace(1)* %out, i32 %tid		%gep.0 = getelementptr float, float addrspace(1)* %out, i32 %tid
%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1		%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1
%gep.out = getelementptr float, float addrspace(1)* %out, i32 %tid		%gep.out = getelementptr float, float addrspace(1)* %out, i32 %tid
▲ Show 20 Lines • Show All 356 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/fsub.f16.ll

Show All 39 Lines	entry:
%r.val = fsub half 1.0, %b.val		%r.val = fsub half 1.0, %b.val
store half %r.val, half addrspace(1)* %r		store half %r.val, half addrspace(1)* %r
ret void		ret void
}		}

; GCN-LABEL: {{^}}fsub_f16_imm_b:		; GCN-LABEL: {{^}}fsub_f16_imm_b:
; GCN: buffer_load_ushort v[[A_F16:[0-9]+]]		; GCN: buffer_load_ushort v[[A_F16:[0-9]+]]
; SI: v_cvt_f32_f16_e32 v[[A_F32:[0-9]+]], v[[A_F16]]		; SI: v_cvt_f32_f16_e32 v[[A_F32:[0-9]+]], v[[A_F16]]
; SI: v_add_f32_e32 v[[R_F32:[0-9]+]], -2.0, v[[A_F32]]		; SI: v_subrev_f32_e32 v[[R_F32:[0-9]+]], 2.0, v[[A_F32]]
; SI: v_cvt_f16_f32_e32 v[[R_F16:[0-9]+]], v[[R_F32]]		; SI: v_cvt_f16_f32_e32 v[[R_F16:[0-9]+]], v[[R_F32]]
; GFX89: v_add_f16_e32 v[[R_F16:[0-9]+]], -2.0, v[[A_F16]]		; GFX89: v_subrev_f16_e32 v[[R_F16:[0-9]+]], 2.0, v[[A_F16]]
; GCN: buffer_store_short v[[R_F16]]		; GCN: buffer_store_short v[[R_F16]]
; GCN: s_endpgm		; GCN: s_endpgm
define amdgpu_kernel void @fsub_f16_imm_b(		define amdgpu_kernel void @fsub_f16_imm_b(
half addrspace(1)* %r,		half addrspace(1)* %r,
half addrspace(1)* %a) {		half addrspace(1)* %a) {
entry:		entry:
%a.val = load volatile half, half addrspace(1)* %a		%a.val = load volatile half, half addrspace(1)* %a
%r.val = fsub half %a.val, 2.0		%r.val = fsub half %a.val, 2.0
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines
}		}

; GCN-LABEL: {{^}}fsub_v2f16_imm_b:		; GCN-LABEL: {{^}}fsub_v2f16_imm_b:
; GCN-DAG: buffer_load_dword v[[A_V2_F16:[0-9]+]]		; GCN-DAG: buffer_load_dword v[[A_V2_F16:[0-9]+]]

; SI-DAG: v_cvt_f32_f16_e32 v[[A_F32_0:[0-9]+]], v[[A_V2_F16]]		; SI-DAG: v_cvt_f32_f16_e32 v[[A_F32_0:[0-9]+]], v[[A_V2_F16]]
; SI-DAG: v_lshrrev_b32_e32 v[[A_F16_1:[0-9]+]], 16, v[[A_V2_F16]]		; SI-DAG: v_lshrrev_b32_e32 v[[A_F16_1:[0-9]+]], 16, v[[A_V2_F16]]
; SI-DAG: v_cvt_f32_f16_e32 v[[A_F32_1:[0-9]+]], v[[A_F16_1]]		; SI-DAG: v_cvt_f32_f16_e32 v[[A_F32_1:[0-9]+]], v[[A_F16_1]]
; SI-DAG: v_add_f32_e32 v[[R_F32_0:[0-9]+]], -2.0, v[[A_F32_0]]		; SI-DAG: v_subrev_f32_e32 v[[R_F32_0:[0-9]+]], 2.0, v[[A_F32_0]]
; SI-DAG: v_cvt_f16_f32_e32 v[[R_F16_0:[0-9]+]], v[[R_F32_0]]		; SI-DAG: v_cvt_f16_f32_e32 v[[R_F16_0:[0-9]+]], v[[R_F32_0]]
; SI-DAG: v_add_f32_e32 v[[R_F32_1:[0-9]+]], -1.0, v[[A_F32_1]]		; SI-DAG: v_subrev_f32_e32 v[[R_F32_1:[0-9]+]], 1.0, v[[A_F32_1]]
; SI-DAG: v_cvt_f16_f32_e32 v[[R_F16_1:[0-9]+]], v[[R_F32_1]]		; SI-DAG: v_cvt_f16_f32_e32 v[[R_F16_1:[0-9]+]], v[[R_F32_1]]
; SI-DAG: v_lshlrev_b32_e32 v[[R_F16_HI:[0-9]+]], 16, v[[R_F16_1]]		; SI-DAG: v_lshlrev_b32_e32 v[[R_F16_HI:[0-9]+]], 16, v[[R_F16_1]]
; SI: v_or_b32_e32 v[[R_V2_F16:[0-9]+]], v[[R_F16_0]], v[[R_F16_HI]]		; SI: v_or_b32_e32 v[[R_V2_F16:[0-9]+]], v[[R_F16_0]], v[[R_F16_HI]]

; VI-DAG: v_mov_b32_e32 [[CONSTM1:v[0-9]+]], 0xbc00		; VI-DAG: v_mov_b32_e32 [[CONSTM1:v[0-9]+]], 0x3c00
; VI-DAG: v_add_f16_sdwa v[[R_F16_HI:[0-9]+]], v[[A_V2_F16]], [[CONSTM1]] dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD		; VI-DAG: v_sub_f16_sdwa v[[R_F16_HI:[0-9]+]], v[[A_V2_F16]], [[CONSTM1]] dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
; VI-DAG: v_add_f16_e32 v[[R_F16_0:[0-9]+]], -2.0, v[[A_V2_F16]]		; VI-DAG: v_subrev_f16_e32 v[[R_F16_0:[0-9]+]], 2.0, v[[A_V2_F16]]
; VI: v_or_b32_e32 v[[R_V2_F16:[0-9]+]], v[[R_F16_0]], v[[R_F16_HI]]		; VI: v_or_b32_e32 v[[R_V2_F16:[0-9]+]], v[[R_F16_0]], v[[R_F16_HI]]

; GFX9: s_mov_b32 [[K:s[0-9]+]], 0xbc00c000		; GFX9: s_mov_b32 [[K:s[0-9]+]], 0xbc00c000
; GFX9: v_pk_add_f16 v[[R_V2_F16:[0-9]+]], v[[A_V2_F16]], [[K]]{{$}}		; GFX9: v_pk_add_f16 v[[R_V2_F16:[0-9]+]], v[[A_V2_F16]], [[K]]{{$}}

; GCN: buffer_store_dword v[[R_V2_F16]]		; GCN: buffer_store_dword v[[R_V2_F16]]
; GCN: s_endpgm		; GCN: s_endpgm

Show All 9 Lines

llvm/test/CodeGen/AMDGPU/reduction.ll

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; GCN-LABEL: {{^}}reduction_fsub_v4f16_preserve_fmf:			; GCN-LABEL: {{^}}reduction_fsub_v4f16_preserve_fmf:
	; GFX9: s_waitcnt			; GFX9: s_waitcnt
	; GFX9-NEXT: v_pk_add_f16 v0, v0, v1 neg_lo:[0,1] neg_hi:[0,1]{{$}}			; GFX9-NEXT: v_pk_add_f16 v0, v0, v1 neg_lo:[0,1] neg_hi:[0,1]{{$}}
	; GFX9-NEXT: v_sub_f16_sdwa v0, v0, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD			; GFX9-NEXT: v_sub_f16_sdwa v0, v0, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
	; GFX9-NEXT: s_setpc_b64			; GFX9-NEXT: s_setpc_b64

	; VI: s_waitcnt			; VI: s_waitcnt
	; VI-NEXT: v_sub_f16_sdwa v2, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1			; VI-NEXT: v_sub_f16_sdwa v2, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
	; VI-NEXT: v_sub_f16_e32 v0, v1, v0			; VI-NEXT: v_sub_f16_e32 v0, v0, v1
	; VI-NEXT: v_add_f16_e32 v0, v2, v0			; VI-NEXT: v_sub_f16_e32 v0, v2, v0
	; VI-NEXT: s_setpc_b64			; VI-NEXT: s_setpc_b64
	define half @reduction_fsub_v4f16_preserve_fmf(<4 x half> %vec4) {			define half @reduction_fsub_v4f16_preserve_fmf(<4 x half> %vec4) {
	entry:			entry:
	%rdx.shuf = shufflevector <4 x half> %vec4, <4 x half> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			%rdx.shuf = shufflevector <4 x half> %vec4, <4 x half> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	%bin.rdx = fsub nsz <4 x half> %vec4, %rdx.shuf			%bin.rdx = fsub nsz <4 x half> %vec4, %rdx.shuf
	%rdx.shuf1 = shufflevector <4 x half> %bin.rdx, <4 x half> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			%rdx.shuf1 = shufflevector <4 x half> %bin.rdx, <4 x half> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	%bin.rdx2 = fsub nsz <4 x half> %bin.rdx, %rdx.shuf1			%bin.rdx2 = fsub nsz <4 x half> %bin.rdx, %rdx.shuf1
	%res = extractelement <4 x half> %bin.rdx2, i32 0			%res = extractelement <4 x half> %bin.rdx2, i32 0
	▲ Show 20 Lines • Show All 520 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/v_mac.ll

	Show First 20 Lines • Show All 215 Lines • ▼ Show 20 Lines
	; Without special casing the inline constant check for v_mac_f32's			; Without special casing the inline constant check for v_mac_f32's
	; src2, this fails to fold the 1.0 into a mad.			; src2, this fails to fold the 1.0 into a mad.

	; GCN-LABEL: {{^}}fold_inline_imm_into_mac_src2_f32:			; GCN-LABEL: {{^}}fold_inline_imm_into_mac_src2_f32:
	; GCN: {{buffer\|flat}}_load_dword [[A:v[0-9]+]]			; GCN: {{buffer\|flat}}_load_dword [[A:v[0-9]+]]
	; GCN: {{buffer\|flat}}_load_dword [[B:v[0-9]+]]			; GCN: {{buffer\|flat}}_load_dword [[B:v[0-9]+]]

	; GCN: v_add_f32_e32 [[TMP2:v[0-9]+]], [[A]], [[A]]			; GCN: v_add_f32_e32 [[TMP2:v[0-9]+]], [[A]], [[A]]
	; GCN: v_mad_f32 v{{[0-9]+}}, [[TMP2]], -4.0, 1.0			; GCN: v_mad_f32 v{{[0-9]+}}, -[[TMP2]], 4.0, 1.0
	define amdgpu_kernel void @fold_inline_imm_into_mac_src2_f32(float addrspace(1)* %out, float addrspace(1)* %a, float addrspace(1)* %b) #3 {			define amdgpu_kernel void @fold_inline_imm_into_mac_src2_f32(float addrspace(1)* %out, float addrspace(1)* %a, float addrspace(1)* %b) #3 {
	bb:			bb:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%tid.ext = sext i32 %tid to i64			%tid.ext = sext i32 %tid to i64
	%gep.a = getelementptr inbounds float, float addrspace(1)* %a, i64 %tid.ext			%gep.a = getelementptr inbounds float, float addrspace(1)* %a, i64 %tid.ext
	%gep.b = getelementptr inbounds float, float addrspace(1)* %b, i64 %tid.ext			%gep.b = getelementptr inbounds float, float addrspace(1)* %b, i64 %tid.ext
	%gep.out = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext			%gep.out = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%tmp = load volatile float, float addrspace(1)* %gep.a			%tmp = load volatile float, float addrspace(1)* %gep.a
	Show All 14 Lines
	; GCN-LABEL: {{^}}fold_inline_imm_into_mac_src2_f16:			; GCN-LABEL: {{^}}fold_inline_imm_into_mac_src2_f16:
	; GCN: {{buffer\|flat}}_load_ushort [[A:v[0-9]+]]			; GCN: {{buffer\|flat}}_load_ushort [[A:v[0-9]+]]
	; GCN: {{buffer\|flat}}_load_ushort [[B:v[0-9]+]]			; GCN: {{buffer\|flat}}_load_ushort [[B:v[0-9]+]]

	; SI-DAG: v_cvt_f32_f16_e32 [[CVT_A:v[0-9]+]], [[A]]			; SI-DAG: v_cvt_f32_f16_e32 [[CVT_A:v[0-9]+]], [[A]]
	; SI-DAG: v_cvt_f32_f16_e32 [[CVT_B:v[0-9]+]], [[B]]			; SI-DAG: v_cvt_f32_f16_e32 [[CVT_B:v[0-9]+]], [[B]]

	; SI: v_add_f32_e32 [[TMP2:v[0-9]+]], [[CVT_A]], [[CVT_A]]			; SI: v_add_f32_e32 [[TMP2:v[0-9]+]], [[CVT_A]], [[CVT_A]]
	; SI: v_mad_f32 v{{[0-9]+}}, [[TMP2]], -4.0, 1.0			; SI: v_mad_f32 v{{[0-9]+}}, -[[TMP2]], 4.0, 1.0
	; SI: v_mac_f32_e32 v{{[0-9]+}}, 0x41000000, v{{[0-9]+}}			; SI: v_mac_f32_e32 v{{[0-9]+}}, 0x41000000, v{{[0-9]+}}

	; VI-FLUSH: v_add_f16_e32 [[TMP2:v[0-9]+]], [[A]], [[A]]			; VI-FLUSH: v_add_f16_e32 [[TMP2:v[0-9]+]], [[A]], [[A]]
	; VI-FLUSH: v_mad_f16 v{{[0-9]+}}, [[TMP2]], -4.0, 1.0			; VI-FLUSH: v_mad_f16 v{{[0-9]+}}, -[[TMP2]], 4.0, 1.0
	define amdgpu_kernel void @fold_inline_imm_into_mac_src2_f16(half addrspace(1)* %out, half addrspace(1)* %a, half addrspace(1)* %b) #3 {			define amdgpu_kernel void @fold_inline_imm_into_mac_src2_f16(half addrspace(1)* %out, half addrspace(1)* %a, half addrspace(1)* %b) #3 {
	bb:			bb:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%tid.ext = sext i32 %tid to i64			%tid.ext = sext i32 %tid to i64
	%gep.a = getelementptr inbounds half, half addrspace(1)* %a, i64 %tid.ext			%gep.a = getelementptr inbounds half, half addrspace(1)* %a, i64 %tid.ext
	%gep.b = getelementptr inbounds half, half addrspace(1)* %b, i64 %tid.ext			%gep.b = getelementptr inbounds half, half addrspace(1)* %b, i64 %tid.ext
	%gep.out = getelementptr inbounds half, half addrspace(1)* %out, i64 %tid.ext			%gep.out = getelementptr inbounds half, half addrspace(1)* %out, i64 %tid.ext
	%tmp = load volatile half, half addrspace(1)* %gep.a			%tmp = load volatile half, half addrspace(1)* %gep.a
	Show All 20 Lines

llvm/test/CodeGen/PowerPC/qpx-recipest.ll

	Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	define <4 x double> @foof_fmf(<4 x double> %a, <4 x float> %b) nounwind {			define <4 x double> @foof_fmf(<4 x double> %a, <4 x float> %b) nounwind {
	; CHECK-LABEL: foof_fmf:			; CHECK-LABEL: foof_fmf:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: addis 3, 2, .LCPI2_0@toc@ha			; CHECK-NEXT: addis 3, 2, .LCPI2_0@toc@ha
	; CHECK-NEXT: qvfrsqrtes 3, 2			; CHECK-NEXT: qvfrsqrtes 3, 2
	; CHECK-NEXT: addi 3, 3, .LCPI2_0@toc@l			; CHECK-NEXT: addi 3, 3, .LCPI2_0@toc@l
	; CHECK-NEXT: qvlfsx 0, 0, 3			; CHECK-NEXT: qvlfsx 0, 0, 3
	; CHECK-NEXT: qvfmuls 4, 3, 3			; CHECK-NEXT: qvfmuls 4, 3, 3
	; CHECK-NEXT: qvfnmsubs 2, 2, 0, 2			; CHECK-NEXT: qvfmsubs 2, 2, 0, 2
	; CHECK-NEXT: qvfmadds 0, 2, 4, 0			; CHECK-NEXT: qvfnmsubs 0, 2, 4, 0
	; CHECK-NEXT: qvfmuls 0, 3, 0			; CHECK-NEXT: qvfmuls 0, 3, 0
	; CHECK-NEXT: qvfmul 1, 1, 0			; CHECK-NEXT: qvfmul 1, 1, 0
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	entry:			entry:
	%x = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> %b)			%x = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> %b)
	%y = fpext <4 x float> %x to <4 x double>			%y = fpext <4 x float> %x to <4 x double>
	%r = fdiv fast <4 x double> %a, %y			%r = fdiv fast <4 x double> %a, %y
	ret <4 x double> %r			ret <4 x double> %r
	▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines
	define <4 x float> @goo_fmf(<4 x float> %a, <4 x float> %b) nounwind {			define <4 x float> @goo_fmf(<4 x float> %a, <4 x float> %b) nounwind {
	; CHECK-LABEL: goo_fmf:			; CHECK-LABEL: goo_fmf:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: addis 3, 2, .LCPI6_0@toc@ha			; CHECK-NEXT: addis 3, 2, .LCPI6_0@toc@ha
	; CHECK-NEXT: qvfrsqrtes 3, 2			; CHECK-NEXT: qvfrsqrtes 3, 2
	; CHECK-NEXT: addi 3, 3, .LCPI6_0@toc@l			; CHECK-NEXT: addi 3, 3, .LCPI6_0@toc@l
	; CHECK-NEXT: qvlfsx 0, 0, 3			; CHECK-NEXT: qvlfsx 0, 0, 3
	; CHECK-NEXT: qvfmuls 4, 3, 3			; CHECK-NEXT: qvfmuls 4, 3, 3
	; CHECK-NEXT: qvfnmsubs 2, 2, 0, 2			; CHECK-NEXT: qvfmsubs 2, 2, 0, 2
	; CHECK-NEXT: qvfmadds 0, 2, 4, 0			; CHECK-NEXT: qvfnmsubs 0, 2, 4, 0
	; CHECK-NEXT: qvfmuls 0, 3, 0			; CHECK-NEXT: qvfmuls 0, 3, 0
	; CHECK-NEXT: qvfmuls 1, 1, 0			; CHECK-NEXT: qvfmuls 1, 1, 0
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	entry:			entry:
	%x = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> %b)			%x = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> %b)
	%r = fdiv fast <4 x float> %a, %x			%r = fdiv fast <4 x float> %a, %x
	ret <4 x float> %r			ret <4 x float> %r
	}			}
	▲ Show 20 Lines • Show All 208 Lines • ▼ Show 20 Lines
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: addis 3, 2, .LCPI16_1@toc@ha			; CHECK-NEXT: addis 3, 2, .LCPI16_1@toc@ha
	; CHECK-NEXT: qvfrsqrtes 2, 1			; CHECK-NEXT: qvfrsqrtes 2, 1
	; CHECK-NEXT: addi 3, 3, .LCPI16_1@toc@l			; CHECK-NEXT: addi 3, 3, .LCPI16_1@toc@l
	; CHECK-NEXT: qvlfsx 0, 0, 3			; CHECK-NEXT: qvlfsx 0, 0, 3
	; CHECK-NEXT: addis 3, 2, .LCPI16_0@toc@ha			; CHECK-NEXT: addis 3, 2, .LCPI16_0@toc@ha
	; CHECK-NEXT: addi 3, 3, .LCPI16_0@toc@l			; CHECK-NEXT: addi 3, 3, .LCPI16_0@toc@l
	; CHECK-NEXT: qvfmuls 4, 2, 2			; CHECK-NEXT: qvfmuls 4, 2, 2
	; CHECK-NEXT: qvfnmsubs 3, 1, 0, 1			; CHECK-NEXT: qvfmsubs 3, 1, 0, 1
	; CHECK-NEXT: qvfmadds 0, 3, 4, 0			; CHECK-NEXT: qvfnmsubs 0, 3, 4, 0
	; CHECK-NEXT: qvlfsx 3, 0, 3			; CHECK-NEXT: qvlfsx 3, 0, 3
	; CHECK-NEXT: addis 3, 2, .LCPI16_2@toc@ha			; CHECK-NEXT: addis 3, 2, .LCPI16_2@toc@ha
	; CHECK-NEXT: addi 3, 3, .LCPI16_2@toc@l			; CHECK-NEXT: addi 3, 3, .LCPI16_2@toc@l
	; CHECK-NEXT: qvlfsx 4, 0, 3			; CHECK-NEXT: qvlfsx 4, 0, 3
	; CHECK-NEXT: qvfmuls 0, 2, 0			; CHECK-NEXT: qvfmuls 0, 2, 0
	; CHECK-NEXT: qvfabs 2, 1			; CHECK-NEXT: qvfabs 2, 1
	; CHECK-NEXT: qvfmuls 0, 0, 1			; CHECK-NEXT: qvfmuls 0, 0, 1
	; CHECK-NEXT: qvfcmplt 1, 2, 3			; CHECK-NEXT: qvfcmplt 1, 2, 3
	Show All 9 Lines
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: addis 3, 2, .LCPI17_1@toc@ha			; CHECK-NEXT: addis 3, 2, .LCPI17_1@toc@ha
	; CHECK-NEXT: qvfrsqrtes 2, 1			; CHECK-NEXT: qvfrsqrtes 2, 1
	; CHECK-NEXT: addi 3, 3, .LCPI17_1@toc@l			; CHECK-NEXT: addi 3, 3, .LCPI17_1@toc@l
	; CHECK-NEXT: qvlfsx 0, 0, 3			; CHECK-NEXT: qvlfsx 0, 0, 3
	; CHECK-NEXT: addis 3, 2, .LCPI17_0@toc@ha			; CHECK-NEXT: addis 3, 2, .LCPI17_0@toc@ha
	; CHECK-NEXT: addi 3, 3, .LCPI17_0@toc@l			; CHECK-NEXT: addi 3, 3, .LCPI17_0@toc@l
	; CHECK-NEXT: qvfmuls 4, 2, 2			; CHECK-NEXT: qvfmuls 4, 2, 2
	; CHECK-NEXT: qvfnmsubs 3, 1, 0, 1			; CHECK-NEXT: qvfmsubs 3, 1, 0, 1
	; CHECK-NEXT: qvfmadds 0, 3, 4, 0			; CHECK-NEXT: qvfnmsubs 0, 3, 4, 0
	; CHECK-NEXT: qvlfsx 3, 0, 3			; CHECK-NEXT: qvlfsx 3, 0, 3
	; CHECK-NEXT: qvfmuls 0, 2, 0			; CHECK-NEXT: qvfmuls 0, 2, 0
	; CHECK-NEXT: qvfmuls 0, 0, 1			; CHECK-NEXT: qvfmuls 0, 0, 1
	; CHECK-NEXT: qvfcmpeq 1, 1, 3			; CHECK-NEXT: qvfcmpeq 1, 1, 3
	; CHECK-NEXT: qvfsel 1, 1, 3, 0			; CHECK-NEXT: qvfsel 1, 1, 3, 0
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	entry:			entry:
	%r = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> %a)			%r = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> %a)
	Show All 26 Lines

llvm/test/CodeGen/X86/dag-fmf-cse.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=fma -enable-unsafe-fp-math \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=fma -enable-unsafe-fp-math \| FileCheck %s

	; If fast-math-flags are propagated correctly, the mul1 expression			; If fast-math-flags are propagated correctly, the mul1 expression
	; should be recognized as a factor in the last fsub, so we should			; should be recognized as a factor in the last fsub, so we should
	; see a mul and add, not a mul and fma:			; see a mul and add, not a mul and fma:
	; a * b - (-a * b) ---> (a * b) + (a * b)			; a * b - (-a * b) ---> (a * b) + (a * b)

	define float @fmf_should_not_break_cse(float %a, float %b) {			define float @fmf_should_not_break_cse(float %a, float %b) {
	; CHECK-LABEL: fmf_should_not_break_cse:			; CHECK-LABEL: fmf_should_not_break_cse:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vmulss %xmm1, %xmm0, %xmm0			; CHECK-NEXT: vxorps {{.*}}(%rip), %xmm0, %xmm2
	; CHECK-NEXT: vaddss %xmm0, %xmm0, %xmm0			; CHECK-NEXT: vmulss %xmm1, %xmm2, %xmm2
				; CHECK-NEXT: vfmsub213ss {{.#+}} xmm0 = (xmm1 xmm0) - xmm2
				RKSimonUnsubmitted Not Done Reply Inline Actions Annoying - any chance you can investigate? RKSimon: Annoying - any chance you can investigate?
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%mul1 = fmul fast float %a, %b			%mul1 = fmul fast float %a, %b
	%nega = fsub fast float 0.0, %a			%nega = fsub fast float 0.0, %a
	%mul2 = fmul fast float %nega, %b			%mul2 = fmul fast float %nega, %b
	%abx2 = fsub fast float %mul1, %mul2			%abx2 = fsub fast float %mul1, %mul2
	ret float %abx2			ret float %abx2
	}			}

	define <4 x float> @fmf_should_not_break_cse_vector(<4 x float> %a, <4 x float> %b) {			define <4 x float> @fmf_should_not_break_cse_vector(<4 x float> %a, <4 x float> %b) {
	; CHECK-LABEL: fmf_should_not_break_cse_vector:			; CHECK-LABEL: fmf_should_not_break_cse_vector:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vmulps %xmm1, %xmm0, %xmm0			; CHECK-NEXT: vmulps %xmm1, %xmm0, %xmm2
	; CHECK-NEXT: vaddps %xmm0, %xmm0, %xmm0			; CHECK-NEXT: vfmadd213ps {{.#+}} xmm0 = (xmm1 xmm0) + xmm2
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%mul1 = fmul fast <4 x float> %a, %b			%mul1 = fmul fast <4 x float> %a, %b
	%nega = fsub fast <4 x float> <float 0.0, float 0.0, float 0.0, float 0.0>, %a			%nega = fsub fast <4 x float> <float 0.0, float 0.0, float 0.0, float 0.0>, %a
	%mul2 = fmul fast <4 x float> %nega, %b			%mul2 = fmul fast <4 x float> %nega, %b
	%abx2 = fsub fast <4 x float> %mul1, %mul2			%abx2 = fsub fast <4 x float> %mul1, %mul2
	ret <4 x float> %abx2			ret <4 x float> %abx2
	}			}

llvm/test/CodeGen/X86/fma_patterns.ll

Show First 20 Lines • Show All 1,031 Lines • ▼ Show 20 Lines	; AVX512-NOINFS-NEXT: retq
%s = fsub <4 x float> <float -1.0, float -1.0, float undef, float -1.0>, %x		%s = fsub <4 x float> <float -1.0, float -1.0, float undef, float -1.0>, %x
%m = fmul <4 x float> %y, %s		%m = fmul <4 x float> %y, %s
ret <4 x float> %m		ret <4 x float> %m
}		}

define <4 x float> @test_v4f32_mul_sub_x_one_y(<4 x float> %x, <4 x float> %y) {		define <4 x float> @test_v4f32_mul_sub_x_one_y(<4 x float> %x, <4 x float> %y) {
; FMA-INFS-LABEL: test_v4f32_mul_sub_x_one_y:		; FMA-INFS-LABEL: test_v4f32_mul_sub_x_one_y:
; FMA-INFS: # %bb.0:		; FMA-INFS: # %bb.0:
; FMA-INFS-NEXT: vaddps {{.*}}(%rip), %xmm0, %xmm0		; FMA-INFS-NEXT: vsubps {{.*}}(%rip), %xmm0, %xmm0
; FMA-INFS-NEXT: vmulps %xmm1, %xmm0, %xmm0		; FMA-INFS-NEXT: vmulps %xmm1, %xmm0, %xmm0
; FMA-INFS-NEXT: retq		; FMA-INFS-NEXT: retq
;		;
; FMA4-INFS-LABEL: test_v4f32_mul_sub_x_one_y:		; FMA4-INFS-LABEL: test_v4f32_mul_sub_x_one_y:
; FMA4-INFS: # %bb.0:		; FMA4-INFS: # %bb.0:
; FMA4-INFS-NEXT: vaddps {{.*}}(%rip), %xmm0, %xmm0		; FMA4-INFS-NEXT: vsubps {{.*}}(%rip), %xmm0, %xmm0
; FMA4-INFS-NEXT: vmulps %xmm1, %xmm0, %xmm0		; FMA4-INFS-NEXT: vmulps %xmm1, %xmm0, %xmm0
; FMA4-INFS-NEXT: retq		; FMA4-INFS-NEXT: retq
;		;
; AVX512-INFS-LABEL: test_v4f32_mul_sub_x_one_y:		; AVX512-INFS-LABEL: test_v4f32_mul_sub_x_one_y:
; AVX512-INFS: # %bb.0:		; AVX512-INFS: # %bb.0:
; AVX512-INFS-NEXT: vaddps {{.*}}(%rip){1to4}, %xmm0, %xmm0		; AVX512-INFS-NEXT: vsubps {{.*}}(%rip){1to4}, %xmm0, %xmm0
; AVX512-INFS-NEXT: vmulps %xmm1, %xmm0, %xmm0		; AVX512-INFS-NEXT: vmulps %xmm1, %xmm0, %xmm0
; AVX512-INFS-NEXT: retq		; AVX512-INFS-NEXT: retq
;		;
; FMA-NOINFS-LABEL: test_v4f32_mul_sub_x_one_y:		; FMA-NOINFS-LABEL: test_v4f32_mul_sub_x_one_y:
; FMA-NOINFS: # %bb.0:		; FMA-NOINFS: # %bb.0:
; FMA-NOINFS-NEXT: vfmsub213ps {{.#+}} xmm0 = (xmm1 xmm0) - xmm1		; FMA-NOINFS-NEXT: vfmsub213ps {{.#+}} xmm0 = (xmm1 xmm0) - xmm1
; FMA-NOINFS-NEXT: retq		; FMA-NOINFS-NEXT: retq
;		;
Show All 9 Lines	; AVX512-NOINFS-NEXT: retq
%s = fsub <4 x float> %x, <float 1.0, float 1.0, float 1.0, float 1.0>		%s = fsub <4 x float> %x, <float 1.0, float 1.0, float 1.0, float 1.0>
%m = fmul <4 x float> %s, %y		%m = fmul <4 x float> %s, %y
ret <4 x float> %m		ret <4 x float> %m
}		}

define <4 x float> @test_v4f32_mul_y_sub_x_one(<4 x float> %x, <4 x float> %y) {		define <4 x float> @test_v4f32_mul_y_sub_x_one(<4 x float> %x, <4 x float> %y) {
; FMA-INFS-LABEL: test_v4f32_mul_y_sub_x_one:		; FMA-INFS-LABEL: test_v4f32_mul_y_sub_x_one:
; FMA-INFS: # %bb.0:		; FMA-INFS: # %bb.0:
; FMA-INFS-NEXT: vaddps {{.*}}(%rip), %xmm0, %xmm0		; FMA-INFS-NEXT: vsubps {{.*}}(%rip), %xmm0, %xmm0
; FMA-INFS-NEXT: vmulps %xmm0, %xmm1, %xmm0		; FMA-INFS-NEXT: vmulps %xmm0, %xmm1, %xmm0
; FMA-INFS-NEXT: retq		; FMA-INFS-NEXT: retq
;		;
; FMA4-INFS-LABEL: test_v4f32_mul_y_sub_x_one:		; FMA4-INFS-LABEL: test_v4f32_mul_y_sub_x_one:
; FMA4-INFS: # %bb.0:		; FMA4-INFS: # %bb.0:
; FMA4-INFS-NEXT: vaddps {{.*}}(%rip), %xmm0, %xmm0		; FMA4-INFS-NEXT: vsubps {{.*}}(%rip), %xmm0, %xmm0
; FMA4-INFS-NEXT: vmulps %xmm0, %xmm1, %xmm0		; FMA4-INFS-NEXT: vmulps %xmm0, %xmm1, %xmm0
; FMA4-INFS-NEXT: retq		; FMA4-INFS-NEXT: retq
;		;
; AVX512-INFS-LABEL: test_v4f32_mul_y_sub_x_one:		; AVX512-INFS-LABEL: test_v4f32_mul_y_sub_x_one:
; AVX512-INFS: # %bb.0:		; AVX512-INFS: # %bb.0:
; AVX512-INFS-NEXT: vaddps {{.*}}(%rip){1to4}, %xmm0, %xmm0		; AVX512-INFS-NEXT: vsubps {{.*}}(%rip){1to4}, %xmm0, %xmm0
; AVX512-INFS-NEXT: vmulps %xmm0, %xmm1, %xmm0		; AVX512-INFS-NEXT: vmulps %xmm0, %xmm1, %xmm0
; AVX512-INFS-NEXT: retq		; AVX512-INFS-NEXT: retq
;		;
; FMA-NOINFS-LABEL: test_v4f32_mul_y_sub_x_one:		; FMA-NOINFS-LABEL: test_v4f32_mul_y_sub_x_one:
; FMA-NOINFS: # %bb.0:		; FMA-NOINFS: # %bb.0:
; FMA-NOINFS-NEXT: vfmsub213ps {{.#+}} xmm0 = (xmm1 xmm0) - xmm1		; FMA-NOINFS-NEXT: vfmsub213ps {{.#+}} xmm0 = (xmm1 xmm0) - xmm1
; FMA-NOINFS-NEXT: retq		; FMA-NOINFS-NEXT: retq
;		;
Show All 9 Lines	; AVX512-NOINFS-NEXT: retq
%s = fsub <4 x float> %x, <float 1.0, float 1.0, float 1.0, float 1.0>		%s = fsub <4 x float> %x, <float 1.0, float 1.0, float 1.0, float 1.0>
%m = fmul <4 x float> %y, %s		%m = fmul <4 x float> %y, %s
ret <4 x float> %m		ret <4 x float> %m
}		}

define <4 x float> @test_v4f32_mul_y_sub_x_one_undefs(<4 x float> %x, <4 x float> %y) {		define <4 x float> @test_v4f32_mul_y_sub_x_one_undefs(<4 x float> %x, <4 x float> %y) {
; FMA-INFS-LABEL: test_v4f32_mul_y_sub_x_one_undefs:		; FMA-INFS-LABEL: test_v4f32_mul_y_sub_x_one_undefs:
; FMA-INFS: # %bb.0:		; FMA-INFS: # %bb.0:
; FMA-INFS-NEXT: vaddps {{.*}}(%rip), %xmm0, %xmm0		; FMA-INFS-NEXT: vsubps {{.*}}(%rip), %xmm0, %xmm0
; FMA-INFS-NEXT: vmulps %xmm0, %xmm1, %xmm0		; FMA-INFS-NEXT: vmulps %xmm0, %xmm1, %xmm0
; FMA-INFS-NEXT: retq		; FMA-INFS-NEXT: retq
;		;
; FMA4-INFS-LABEL: test_v4f32_mul_y_sub_x_one_undefs:		; FMA4-INFS-LABEL: test_v4f32_mul_y_sub_x_one_undefs:
; FMA4-INFS: # %bb.0:		; FMA4-INFS: # %bb.0:
; FMA4-INFS-NEXT: vaddps {{.*}}(%rip), %xmm0, %xmm0		; FMA4-INFS-NEXT: vsubps {{.*}}(%rip), %xmm0, %xmm0
; FMA4-INFS-NEXT: vmulps %xmm0, %xmm1, %xmm0		; FMA4-INFS-NEXT: vmulps %xmm0, %xmm1, %xmm0
; FMA4-INFS-NEXT: retq		; FMA4-INFS-NEXT: retq
;		;
; AVX512-INFS-LABEL: test_v4f32_mul_y_sub_x_one_undefs:		; AVX512-INFS-LABEL: test_v4f32_mul_y_sub_x_one_undefs:
; AVX512-INFS: # %bb.0:		; AVX512-INFS: # %bb.0:
; AVX512-INFS-NEXT: vaddps {{.*}}(%rip){1to4}, %xmm0, %xmm0		; AVX512-INFS-NEXT: vsubps {{.*}}(%rip){1to4}, %xmm0, %xmm0
; AVX512-INFS-NEXT: vmulps %xmm0, %xmm1, %xmm0		; AVX512-INFS-NEXT: vmulps %xmm0, %xmm1, %xmm0
; AVX512-INFS-NEXT: retq		; AVX512-INFS-NEXT: retq
;		;
; FMA-NOINFS-LABEL: test_v4f32_mul_y_sub_x_one_undefs:		; FMA-NOINFS-LABEL: test_v4f32_mul_y_sub_x_one_undefs:
; FMA-NOINFS: # %bb.0:		; FMA-NOINFS: # %bb.0:
; FMA-NOINFS-NEXT: vfmsub213ps {{.#+}} xmm0 = (xmm1 xmm0) - xmm1		; FMA-NOINFS-NEXT: vfmsub213ps {{.#+}} xmm0 = (xmm1 xmm0) - xmm1
; FMA-NOINFS-NEXT: retq		; FMA-NOINFS-NEXT: retq
;		;
Show All 9 Lines	; AVX512-NOINFS-NEXT: retq
%s = fsub <4 x float> %x, <float 1.0, float 1.0, float 1.0, float undef>		%s = fsub <4 x float> %x, <float 1.0, float 1.0, float 1.0, float undef>
%m = fmul <4 x float> %y, %s		%m = fmul <4 x float> %y, %s
ret <4 x float> %m		ret <4 x float> %m
}		}

define <4 x float> @test_v4f32_mul_sub_x_negone_y(<4 x float> %x, <4 x float> %y) {		define <4 x float> @test_v4f32_mul_sub_x_negone_y(<4 x float> %x, <4 x float> %y) {
; FMA-INFS-LABEL: test_v4f32_mul_sub_x_negone_y:		; FMA-INFS-LABEL: test_v4f32_mul_sub_x_negone_y:
; FMA-INFS: # %bb.0:		; FMA-INFS: # %bb.0:
; FMA-INFS-NEXT: vaddps {{.*}}(%rip), %xmm0, %xmm0		; FMA-INFS-NEXT: vsubps {{.*}}(%rip), %xmm0, %xmm0
; FMA-INFS-NEXT: vmulps %xmm1, %xmm0, %xmm0		; FMA-INFS-NEXT: vmulps %xmm1, %xmm0, %xmm0
; FMA-INFS-NEXT: retq		; FMA-INFS-NEXT: retq
;		;
; FMA4-INFS-LABEL: test_v4f32_mul_sub_x_negone_y:		; FMA4-INFS-LABEL: test_v4f32_mul_sub_x_negone_y:
; FMA4-INFS: # %bb.0:		; FMA4-INFS: # %bb.0:
; FMA4-INFS-NEXT: vaddps {{.*}}(%rip), %xmm0, %xmm0		; FMA4-INFS-NEXT: vsubps {{.*}}(%rip), %xmm0, %xmm0
; FMA4-INFS-NEXT: vmulps %xmm1, %xmm0, %xmm0		; FMA4-INFS-NEXT: vmulps %xmm1, %xmm0, %xmm0
; FMA4-INFS-NEXT: retq		; FMA4-INFS-NEXT: retq
;		;
; AVX512-INFS-LABEL: test_v4f32_mul_sub_x_negone_y:		; AVX512-INFS-LABEL: test_v4f32_mul_sub_x_negone_y:
; AVX512-INFS: # %bb.0:		; AVX512-INFS: # %bb.0:
; AVX512-INFS-NEXT: vaddps {{.*}}(%rip){1to4}, %xmm0, %xmm0		; AVX512-INFS-NEXT: vsubps {{.*}}(%rip){1to4}, %xmm0, %xmm0
; AVX512-INFS-NEXT: vmulps %xmm1, %xmm0, %xmm0		; AVX512-INFS-NEXT: vmulps %xmm1, %xmm0, %xmm0
; AVX512-INFS-NEXT: retq		; AVX512-INFS-NEXT: retq
;		;
; FMA-NOINFS-LABEL: test_v4f32_mul_sub_x_negone_y:		; FMA-NOINFS-LABEL: test_v4f32_mul_sub_x_negone_y:
; FMA-NOINFS: # %bb.0:		; FMA-NOINFS: # %bb.0:
; FMA-NOINFS-NEXT: vfmadd213ps {{.#+}} xmm0 = (xmm1 xmm0) + xmm1		; FMA-NOINFS-NEXT: vfmadd213ps {{.#+}} xmm0 = (xmm1 xmm0) + xmm1
; FMA-NOINFS-NEXT: retq		; FMA-NOINFS-NEXT: retq
;		;
Show All 9 Lines	; AVX512-NOINFS-NEXT: retq
%s = fsub <4 x float> %x, <float -1.0, float -1.0, float -1.0, float -1.0>		%s = fsub <4 x float> %x, <float -1.0, float -1.0, float -1.0, float -1.0>
%m = fmul <4 x float> %s, %y		%m = fmul <4 x float> %s, %y
ret <4 x float> %m		ret <4 x float> %m
}		}

define <4 x float> @test_v4f32_mul_y_sub_x_negone(<4 x float> %x, <4 x float> %y) {		define <4 x float> @test_v4f32_mul_y_sub_x_negone(<4 x float> %x, <4 x float> %y) {
; FMA-INFS-LABEL: test_v4f32_mul_y_sub_x_negone:		; FMA-INFS-LABEL: test_v4f32_mul_y_sub_x_negone:
; FMA-INFS: # %bb.0:		; FMA-INFS: # %bb.0:
; FMA-INFS-NEXT: vaddps {{.*}}(%rip), %xmm0, %xmm0		; FMA-INFS-NEXT: vsubps {{.*}}(%rip), %xmm0, %xmm0
; FMA-INFS-NEXT: vmulps %xmm0, %xmm1, %xmm0		; FMA-INFS-NEXT: vmulps %xmm0, %xmm1, %xmm0
; FMA-INFS-NEXT: retq		; FMA-INFS-NEXT: retq
;		;
; FMA4-INFS-LABEL: test_v4f32_mul_y_sub_x_negone:		; FMA4-INFS-LABEL: test_v4f32_mul_y_sub_x_negone:
; FMA4-INFS: # %bb.0:		; FMA4-INFS: # %bb.0:
; FMA4-INFS-NEXT: vaddps {{.*}}(%rip), %xmm0, %xmm0		; FMA4-INFS-NEXT: vsubps {{.*}}(%rip), %xmm0, %xmm0
; FMA4-INFS-NEXT: vmulps %xmm0, %xmm1, %xmm0		; FMA4-INFS-NEXT: vmulps %xmm0, %xmm1, %xmm0
; FMA4-INFS-NEXT: retq		; FMA4-INFS-NEXT: retq
;		;
; AVX512-INFS-LABEL: test_v4f32_mul_y_sub_x_negone:		; AVX512-INFS-LABEL: test_v4f32_mul_y_sub_x_negone:
; AVX512-INFS: # %bb.0:		; AVX512-INFS: # %bb.0:
; AVX512-INFS-NEXT: vaddps {{.*}}(%rip){1to4}, %xmm0, %xmm0		; AVX512-INFS-NEXT: vsubps {{.*}}(%rip){1to4}, %xmm0, %xmm0
; AVX512-INFS-NEXT: vmulps %xmm0, %xmm1, %xmm0		; AVX512-INFS-NEXT: vmulps %xmm0, %xmm1, %xmm0
; AVX512-INFS-NEXT: retq		; AVX512-INFS-NEXT: retq
;		;
; FMA-NOINFS-LABEL: test_v4f32_mul_y_sub_x_negone:		; FMA-NOINFS-LABEL: test_v4f32_mul_y_sub_x_negone:
; FMA-NOINFS: # %bb.0:		; FMA-NOINFS: # %bb.0:
; FMA-NOINFS-NEXT: vfmadd213ps {{.#+}} xmm0 = (xmm1 xmm0) + xmm1		; FMA-NOINFS-NEXT: vfmadd213ps {{.#+}} xmm0 = (xmm1 xmm0) + xmm1
; FMA-NOINFS-NEXT: retq		; FMA-NOINFS-NEXT: retq
;		;
Show All 9 Lines	; AVX512-NOINFS-NEXT: retq
%s = fsub <4 x float> %x, <float -1.0, float -1.0, float -1.0, float -1.0>		%s = fsub <4 x float> %x, <float -1.0, float -1.0, float -1.0, float -1.0>
%m = fmul <4 x float> %y, %s		%m = fmul <4 x float> %y, %s
ret <4 x float> %m		ret <4 x float> %m
}		}

define <4 x float> @test_v4f32_mul_y_sub_x_negone_undefs(<4 x float> %x, <4 x float> %y) {		define <4 x float> @test_v4f32_mul_y_sub_x_negone_undefs(<4 x float> %x, <4 x float> %y) {
; FMA-INFS-LABEL: test_v4f32_mul_y_sub_x_negone_undefs:		; FMA-INFS-LABEL: test_v4f32_mul_y_sub_x_negone_undefs:
; FMA-INFS: # %bb.0:		; FMA-INFS: # %bb.0:
; FMA-INFS-NEXT: vaddps {{.*}}(%rip), %xmm0, %xmm0		; FMA-INFS-NEXT: vsubps {{.*}}(%rip), %xmm0, %xmm0
; FMA-INFS-NEXT: vmulps %xmm0, %xmm1, %xmm0		; FMA-INFS-NEXT: vmulps %xmm0, %xmm1, %xmm0
; FMA-INFS-NEXT: retq		; FMA-INFS-NEXT: retq
;		;
; FMA4-INFS-LABEL: test_v4f32_mul_y_sub_x_negone_undefs:		; FMA4-INFS-LABEL: test_v4f32_mul_y_sub_x_negone_undefs:
; FMA4-INFS: # %bb.0:		; FMA4-INFS: # %bb.0:
; FMA4-INFS-NEXT: vaddps {{.*}}(%rip), %xmm0, %xmm0		; FMA4-INFS-NEXT: vsubps {{.*}}(%rip), %xmm0, %xmm0
; FMA4-INFS-NEXT: vmulps %xmm0, %xmm1, %xmm0		; FMA4-INFS-NEXT: vmulps %xmm0, %xmm1, %xmm0
; FMA4-INFS-NEXT: retq		; FMA4-INFS-NEXT: retq
;		;
; AVX512-INFS-LABEL: test_v4f32_mul_y_sub_x_negone_undefs:		; AVX512-INFS-LABEL: test_v4f32_mul_y_sub_x_negone_undefs:
; AVX512-INFS: # %bb.0:		; AVX512-INFS: # %bb.0:
; AVX512-INFS-NEXT: vaddps {{.*}}(%rip){1to4}, %xmm0, %xmm0		; AVX512-INFS-NEXT: vsubps {{.*}}(%rip){1to4}, %xmm0, %xmm0
; AVX512-INFS-NEXT: vmulps %xmm0, %xmm1, %xmm0		; AVX512-INFS-NEXT: vmulps %xmm0, %xmm1, %xmm0
; AVX512-INFS-NEXT: retq		; AVX512-INFS-NEXT: retq
;		;
; FMA-NOINFS-LABEL: test_v4f32_mul_y_sub_x_negone_undefs:		; FMA-NOINFS-LABEL: test_v4f32_mul_y_sub_x_negone_undefs:
; FMA-NOINFS: # %bb.0:		; FMA-NOINFS: # %bb.0:
; FMA-NOINFS-NEXT: vfmadd213ps {{.#+}} xmm0 = (xmm1 xmm0) + xmm1		; FMA-NOINFS-NEXT: vfmadd213ps {{.#+}} xmm0 = (xmm1 xmm0) + xmm1
; FMA-NOINFS-NEXT: retq		; FMA-NOINFS-NEXT: retq
;		;
▲ Show 20 Lines • Show All 547 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fma_patterns_wide.ll

Show First 20 Lines • Show All 625 Lines • ▼ Show 20 Lines	; AVX512-NOINFS-NEXT: retq
%s = fsub <8 x double> <double -1.0, double -1.0, double -1.0, double -1.0, double -1.0, double -1.0, double -1.0, double -1.0>, %x		%s = fsub <8 x double> <double -1.0, double -1.0, double -1.0, double -1.0, double -1.0, double -1.0, double -1.0, double -1.0>, %x
%m = fmul <8 x double> %y, %s		%m = fmul <8 x double> %y, %s
ret <8 x double> %m		ret <8 x double> %m
}		}

define <16 x float> @test_v16f32_mul_sub_x_one_y(<16 x float> %x, <16 x float> %y) {		define <16 x float> @test_v16f32_mul_sub_x_one_y(<16 x float> %x, <16 x float> %y) {
; FMA-INFS-LABEL: test_v16f32_mul_sub_x_one_y:		; FMA-INFS-LABEL: test_v16f32_mul_sub_x_one_y:
; FMA-INFS: # %bb.0:		; FMA-INFS: # %bb.0:
; FMA-INFS-NEXT: vmovaps {{.*#+}} ymm4 = [-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0]		; FMA-INFS-NEXT: vmovaps {{.*#+}} ymm4 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0]
; FMA-INFS-NEXT: vaddps %ymm4, %ymm1, %ymm1		; FMA-INFS-NEXT: vsubps %ymm4, %ymm1, %ymm1
; FMA-INFS-NEXT: vaddps %ymm4, %ymm0, %ymm0		; FMA-INFS-NEXT: vsubps %ymm4, %ymm0, %ymm0
; FMA-INFS-NEXT: vmulps %ymm2, %ymm0, %ymm0		; FMA-INFS-NEXT: vmulps %ymm2, %ymm0, %ymm0
; FMA-INFS-NEXT: vmulps %ymm3, %ymm1, %ymm1		; FMA-INFS-NEXT: vmulps %ymm3, %ymm1, %ymm1
; FMA-INFS-NEXT: retq		; FMA-INFS-NEXT: retq
;		;
; FMA4-INFS-LABEL: test_v16f32_mul_sub_x_one_y:		; FMA4-INFS-LABEL: test_v16f32_mul_sub_x_one_y:
; FMA4-INFS: # %bb.0:		; FMA4-INFS: # %bb.0:
; FMA4-INFS-NEXT: vmovaps {{.*#+}} ymm4 = [-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0]		; FMA4-INFS-NEXT: vmovaps {{.*#+}} ymm4 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0]
; FMA4-INFS-NEXT: vaddps %ymm4, %ymm1, %ymm1		; FMA4-INFS-NEXT: vsubps %ymm4, %ymm1, %ymm1
; FMA4-INFS-NEXT: vaddps %ymm4, %ymm0, %ymm0		; FMA4-INFS-NEXT: vsubps %ymm4, %ymm0, %ymm0
; FMA4-INFS-NEXT: vmulps %ymm2, %ymm0, %ymm0		; FMA4-INFS-NEXT: vmulps %ymm2, %ymm0, %ymm0
; FMA4-INFS-NEXT: vmulps %ymm3, %ymm1, %ymm1		; FMA4-INFS-NEXT: vmulps %ymm3, %ymm1, %ymm1
; FMA4-INFS-NEXT: retq		; FMA4-INFS-NEXT: retq
;		;
; AVX512-INFS-LABEL: test_v16f32_mul_sub_x_one_y:		; AVX512-INFS-LABEL: test_v16f32_mul_sub_x_one_y:
; AVX512-INFS: # %bb.0:		; AVX512-INFS: # %bb.0:
; AVX512-INFS-NEXT: vaddps {{.*}}(%rip){1to16}, %zmm0, %zmm0		; AVX512-INFS-NEXT: vsubps {{.*}}(%rip){1to16}, %zmm0, %zmm0
; AVX512-INFS-NEXT: vmulps %zmm1, %zmm0, %zmm0		; AVX512-INFS-NEXT: vmulps %zmm1, %zmm0, %zmm0
; AVX512-INFS-NEXT: retq		; AVX512-INFS-NEXT: retq
;		;
; FMA-NOINFS-LABEL: test_v16f32_mul_sub_x_one_y:		; FMA-NOINFS-LABEL: test_v16f32_mul_sub_x_one_y:
; FMA-NOINFS: # %bb.0:		; FMA-NOINFS: # %bb.0:
; FMA-NOINFS-NEXT: vfmsub213ps {{.#+}} ymm0 = (ymm2 ymm0) - ymm2		; FMA-NOINFS-NEXT: vfmsub213ps {{.#+}} ymm0 = (ymm2 ymm0) - ymm2
; FMA-NOINFS-NEXT: vfmsub213ps {{.#+}} ymm1 = (ymm3 ymm1) - ymm3		; FMA-NOINFS-NEXT: vfmsub213ps {{.#+}} ymm1 = (ymm3 ymm1) - ymm3
; FMA-NOINFS-NEXT: retq		; FMA-NOINFS-NEXT: retq
Show All 11 Lines	; AVX512-NOINFS-NEXT: retq
%s = fsub <16 x float> %x, <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>		%s = fsub <16 x float> %x, <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>
%m = fmul <16 x float> %s, %y		%m = fmul <16 x float> %s, %y
ret <16 x float> %m		ret <16 x float> %m
}		}

define <8 x double> @test_v8f64_mul_y_sub_x_one(<8 x double> %x, <8 x double> %y) {		define <8 x double> @test_v8f64_mul_y_sub_x_one(<8 x double> %x, <8 x double> %y) {
; FMA-INFS-LABEL: test_v8f64_mul_y_sub_x_one:		; FMA-INFS-LABEL: test_v8f64_mul_y_sub_x_one:
; FMA-INFS: # %bb.0:		; FMA-INFS: # %bb.0:
; FMA-INFS-NEXT: vmovapd {{.*#+}} ymm4 = [-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0]		; FMA-INFS-NEXT: vmovapd {{.*#+}} ymm4 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0]
; FMA-INFS-NEXT: vaddpd %ymm4, %ymm1, %ymm1		; FMA-INFS-NEXT: vsubpd %ymm4, %ymm1, %ymm1
; FMA-INFS-NEXT: vaddpd %ymm4, %ymm0, %ymm0		; FMA-INFS-NEXT: vsubpd %ymm4, %ymm0, %ymm0
; FMA-INFS-NEXT: vmulpd %ymm0, %ymm2, %ymm0		; FMA-INFS-NEXT: vmulpd %ymm0, %ymm2, %ymm0
; FMA-INFS-NEXT: vmulpd %ymm1, %ymm3, %ymm1		; FMA-INFS-NEXT: vmulpd %ymm1, %ymm3, %ymm1
; FMA-INFS-NEXT: retq		; FMA-INFS-NEXT: retq
;		;
; FMA4-INFS-LABEL: test_v8f64_mul_y_sub_x_one:		; FMA4-INFS-LABEL: test_v8f64_mul_y_sub_x_one:
; FMA4-INFS: # %bb.0:		; FMA4-INFS: # %bb.0:
; FMA4-INFS-NEXT: vmovapd {{.*#+}} ymm4 = [-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0]		; FMA4-INFS-NEXT: vmovapd {{.*#+}} ymm4 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0]
; FMA4-INFS-NEXT: vaddpd %ymm4, %ymm1, %ymm1		; FMA4-INFS-NEXT: vsubpd %ymm4, %ymm1, %ymm1
; FMA4-INFS-NEXT: vaddpd %ymm4, %ymm0, %ymm0		; FMA4-INFS-NEXT: vsubpd %ymm4, %ymm0, %ymm0
; FMA4-INFS-NEXT: vmulpd %ymm0, %ymm2, %ymm0		; FMA4-INFS-NEXT: vmulpd %ymm0, %ymm2, %ymm0
; FMA4-INFS-NEXT: vmulpd %ymm1, %ymm3, %ymm1		; FMA4-INFS-NEXT: vmulpd %ymm1, %ymm3, %ymm1
; FMA4-INFS-NEXT: retq		; FMA4-INFS-NEXT: retq
;		;
; AVX512-INFS-LABEL: test_v8f64_mul_y_sub_x_one:		; AVX512-INFS-LABEL: test_v8f64_mul_y_sub_x_one:
; AVX512-INFS: # %bb.0:		; AVX512-INFS: # %bb.0:
; AVX512-INFS-NEXT: vaddpd {{.*}}(%rip){1to8}, %zmm0, %zmm0		; AVX512-INFS-NEXT: vsubpd {{.*}}(%rip){1to8}, %zmm0, %zmm0
; AVX512-INFS-NEXT: vmulpd %zmm0, %zmm1, %zmm0		; AVX512-INFS-NEXT: vmulpd %zmm0, %zmm1, %zmm0
; AVX512-INFS-NEXT: retq		; AVX512-INFS-NEXT: retq
;		;
; FMA-NOINFS-LABEL: test_v8f64_mul_y_sub_x_one:		; FMA-NOINFS-LABEL: test_v8f64_mul_y_sub_x_one:
; FMA-NOINFS: # %bb.0:		; FMA-NOINFS: # %bb.0:
; FMA-NOINFS-NEXT: vfmsub213pd {{.#+}} ymm0 = (ymm2 ymm0) - ymm2		; FMA-NOINFS-NEXT: vfmsub213pd {{.#+}} ymm0 = (ymm2 ymm0) - ymm2
; FMA-NOINFS-NEXT: vfmsub213pd {{.#+}} ymm1 = (ymm3 ymm1) - ymm3		; FMA-NOINFS-NEXT: vfmsub213pd {{.#+}} ymm1 = (ymm3 ymm1) - ymm3
; FMA-NOINFS-NEXT: retq		; FMA-NOINFS-NEXT: retq
Show All 11 Lines	; AVX512-NOINFS-NEXT: retq
%s = fsub <8 x double> %x, <double 1.0, double 1.0, double 1.0, double 1.0, double 1.0, double 1.0, double 1.0, double 1.0>		%s = fsub <8 x double> %x, <double 1.0, double 1.0, double 1.0, double 1.0, double 1.0, double 1.0, double 1.0, double 1.0>
%m = fmul <8 x double> %y, %s		%m = fmul <8 x double> %y, %s
ret <8 x double> %m		ret <8 x double> %m
}		}

define <16 x float> @test_v16f32_mul_sub_x_negone_y(<16 x float> %x, <16 x float> %y) {		define <16 x float> @test_v16f32_mul_sub_x_negone_y(<16 x float> %x, <16 x float> %y) {
; FMA-INFS-LABEL: test_v16f32_mul_sub_x_negone_y:		; FMA-INFS-LABEL: test_v16f32_mul_sub_x_negone_y:
; FMA-INFS: # %bb.0:		; FMA-INFS: # %bb.0:
; FMA-INFS-NEXT: vmovaps {{.*#+}} ymm4 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0]		; FMA-INFS-NEXT: vmovaps {{.*#+}} ymm4 = [-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0]
; FMA-INFS-NEXT: vaddps %ymm4, %ymm1, %ymm1		; FMA-INFS-NEXT: vsubps %ymm4, %ymm1, %ymm1
; FMA-INFS-NEXT: vaddps %ymm4, %ymm0, %ymm0		; FMA-INFS-NEXT: vsubps %ymm4, %ymm0, %ymm0
; FMA-INFS-NEXT: vmulps %ymm2, %ymm0, %ymm0		; FMA-INFS-NEXT: vmulps %ymm2, %ymm0, %ymm0
; FMA-INFS-NEXT: vmulps %ymm3, %ymm1, %ymm1		; FMA-INFS-NEXT: vmulps %ymm3, %ymm1, %ymm1
; FMA-INFS-NEXT: retq		; FMA-INFS-NEXT: retq
;		;
; FMA4-INFS-LABEL: test_v16f32_mul_sub_x_negone_y:		; FMA4-INFS-LABEL: test_v16f32_mul_sub_x_negone_y:
; FMA4-INFS: # %bb.0:		; FMA4-INFS: # %bb.0:
; FMA4-INFS-NEXT: vmovaps {{.*#+}} ymm4 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0]		; FMA4-INFS-NEXT: vmovaps {{.*#+}} ymm4 = [-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0]
; FMA4-INFS-NEXT: vaddps %ymm4, %ymm1, %ymm1		; FMA4-INFS-NEXT: vsubps %ymm4, %ymm1, %ymm1
; FMA4-INFS-NEXT: vaddps %ymm4, %ymm0, %ymm0		; FMA4-INFS-NEXT: vsubps %ymm4, %ymm0, %ymm0
; FMA4-INFS-NEXT: vmulps %ymm2, %ymm0, %ymm0		; FMA4-INFS-NEXT: vmulps %ymm2, %ymm0, %ymm0
; FMA4-INFS-NEXT: vmulps %ymm3, %ymm1, %ymm1		; FMA4-INFS-NEXT: vmulps %ymm3, %ymm1, %ymm1
; FMA4-INFS-NEXT: retq		; FMA4-INFS-NEXT: retq
;		;
; AVX512-INFS-LABEL: test_v16f32_mul_sub_x_negone_y:		; AVX512-INFS-LABEL: test_v16f32_mul_sub_x_negone_y:
; AVX512-INFS: # %bb.0:		; AVX512-INFS: # %bb.0:
; AVX512-INFS-NEXT: vaddps {{.*}}(%rip){1to16}, %zmm0, %zmm0		; AVX512-INFS-NEXT: vsubps {{.*}}(%rip){1to16}, %zmm0, %zmm0
; AVX512-INFS-NEXT: vmulps %zmm1, %zmm0, %zmm0		; AVX512-INFS-NEXT: vmulps %zmm1, %zmm0, %zmm0
; AVX512-INFS-NEXT: retq		; AVX512-INFS-NEXT: retq
;		;
; FMA-NOINFS-LABEL: test_v16f32_mul_sub_x_negone_y:		; FMA-NOINFS-LABEL: test_v16f32_mul_sub_x_negone_y:
; FMA-NOINFS: # %bb.0:		; FMA-NOINFS: # %bb.0:
; FMA-NOINFS-NEXT: vfmadd213ps {{.#+}} ymm0 = (ymm2 ymm0) + ymm2		; FMA-NOINFS-NEXT: vfmadd213ps {{.#+}} ymm0 = (ymm2 ymm0) + ymm2
; FMA-NOINFS-NEXT: vfmadd213ps {{.#+}} ymm1 = (ymm3 ymm1) + ymm3		; FMA-NOINFS-NEXT: vfmadd213ps {{.#+}} ymm1 = (ymm3 ymm1) + ymm3
; FMA-NOINFS-NEXT: retq		; FMA-NOINFS-NEXT: retq
Show All 11 Lines	; AVX512-NOINFS-NEXT: retq
%s = fsub <16 x float> %x, <float -1.0, float -1.0, float -1.0, float -1.0, float -1.0, float -1.0, float -1.0, float -1.0, float -1.0, float -1.0, float -1.0, float -1.0, float -1.0, float -1.0, float -1.0, float -1.0>		%s = fsub <16 x float> %x, <float -1.0, float -1.0, float -1.0, float -1.0, float -1.0, float -1.0, float -1.0, float -1.0, float -1.0, float -1.0, float -1.0, float -1.0, float -1.0, float -1.0, float -1.0, float -1.0>
%m = fmul <16 x float> %s, %y		%m = fmul <16 x float> %s, %y
ret <16 x float> %m		ret <16 x float> %m
}		}

define <8 x double> @test_v8f64_mul_y_sub_x_negone(<8 x double> %x, <8 x double> %y) {		define <8 x double> @test_v8f64_mul_y_sub_x_negone(<8 x double> %x, <8 x double> %y) {
; FMA-INFS-LABEL: test_v8f64_mul_y_sub_x_negone:		; FMA-INFS-LABEL: test_v8f64_mul_y_sub_x_negone:
; FMA-INFS: # %bb.0:		; FMA-INFS: # %bb.0:
; FMA-INFS-NEXT: vmovapd {{.*#+}} ymm4 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0]		; FMA-INFS-NEXT: vmovapd {{.*#+}} ymm4 = [-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0]
; FMA-INFS-NEXT: vaddpd %ymm4, %ymm1, %ymm1		; FMA-INFS-NEXT: vsubpd %ymm4, %ymm1, %ymm1
; FMA-INFS-NEXT: vaddpd %ymm4, %ymm0, %ymm0		; FMA-INFS-NEXT: vsubpd %ymm4, %ymm0, %ymm0
; FMA-INFS-NEXT: vmulpd %ymm0, %ymm2, %ymm0		; FMA-INFS-NEXT: vmulpd %ymm0, %ymm2, %ymm0
; FMA-INFS-NEXT: vmulpd %ymm1, %ymm3, %ymm1		; FMA-INFS-NEXT: vmulpd %ymm1, %ymm3, %ymm1
; FMA-INFS-NEXT: retq		; FMA-INFS-NEXT: retq
;		;
; FMA4-INFS-LABEL: test_v8f64_mul_y_sub_x_negone:		; FMA4-INFS-LABEL: test_v8f64_mul_y_sub_x_negone:
; FMA4-INFS: # %bb.0:		; FMA4-INFS: # %bb.0:
; FMA4-INFS-NEXT: vmovapd {{.*#+}} ymm4 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0]		; FMA4-INFS-NEXT: vmovapd {{.*#+}} ymm4 = [-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0]
; FMA4-INFS-NEXT: vaddpd %ymm4, %ymm1, %ymm1		; FMA4-INFS-NEXT: vsubpd %ymm4, %ymm1, %ymm1
; FMA4-INFS-NEXT: vaddpd %ymm4, %ymm0, %ymm0		; FMA4-INFS-NEXT: vsubpd %ymm4, %ymm0, %ymm0
; FMA4-INFS-NEXT: vmulpd %ymm0, %ymm2, %ymm0		; FMA4-INFS-NEXT: vmulpd %ymm0, %ymm2, %ymm0
; FMA4-INFS-NEXT: vmulpd %ymm1, %ymm3, %ymm1		; FMA4-INFS-NEXT: vmulpd %ymm1, %ymm3, %ymm1
; FMA4-INFS-NEXT: retq		; FMA4-INFS-NEXT: retq
;		;
; AVX512-INFS-LABEL: test_v8f64_mul_y_sub_x_negone:		; AVX512-INFS-LABEL: test_v8f64_mul_y_sub_x_negone:
; AVX512-INFS: # %bb.0:		; AVX512-INFS: # %bb.0:
; AVX512-INFS-NEXT: vaddpd {{.*}}(%rip){1to8}, %zmm0, %zmm0		; AVX512-INFS-NEXT: vsubpd {{.*}}(%rip){1to8}, %zmm0, %zmm0
; AVX512-INFS-NEXT: vmulpd %zmm0, %zmm1, %zmm0		; AVX512-INFS-NEXT: vmulpd %zmm0, %zmm1, %zmm0
; AVX512-INFS-NEXT: retq		; AVX512-INFS-NEXT: retq
;		;
; FMA-NOINFS-LABEL: test_v8f64_mul_y_sub_x_negone:		; FMA-NOINFS-LABEL: test_v8f64_mul_y_sub_x_negone:
; FMA-NOINFS: # %bb.0:		; FMA-NOINFS: # %bb.0:
; FMA-NOINFS-NEXT: vfmadd213pd {{.#+}} ymm0 = (ymm2 ymm0) + ymm2		; FMA-NOINFS-NEXT: vfmadd213pd {{.#+}} ymm0 = (ymm2 ymm0) + ymm2
; FMA-NOINFS-NEXT: vfmadd213pd {{.#+}} ymm1 = (ymm3 ymm1) + ymm3		; FMA-NOINFS-NEXT: vfmadd213pd {{.#+}} ymm1 = (ymm3 ymm1) + ymm3
; FMA-NOINFS-NEXT: retq		; FMA-NOINFS-NEXT: retq
▲ Show 20 Lines • Show All 372 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fp-fold.ll

		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s		; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s

define float @fadd_zero_strict(float %x) {		define float @fadd_zero_strict(float %x) {
; CHECK-LABEL: fadd_zero_strict:		; CHECK-LABEL: fadd_zero_strict:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: xorps %xmm1, %xmm1		; CHECK-NEXT: xorps %xmm1, %xmm1
; CHECK-NEXT: addss %xmm1, %xmm0		; CHECK-NEXT: addss %xmm1, %xmm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
▲ Show 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq
%a = fadd <4 x float> %y, %x		%a = fadd <4 x float> %y, %x
%r = fsub reassoc nsz <4 x float> %y, %a		%r = fsub reassoc nsz <4 x float> %y, %a
ret <4 x float> %r		ret <4 x float> %r
}		}

define float @fsub_negzero_strict(float %x) {		define float @fsub_negzero_strict(float %x) {
; CHECK-LABEL: fsub_negzero_strict:		; CHECK-LABEL: fsub_negzero_strict:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: xorps %xmm1, %xmm1		; CHECK-NEXT: subss {{.*}}(%rip), %xmm0
; CHECK-NEXT: addss %xmm1, %xmm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%r = fsub float %x, -0.0		%r = fsub float %x, -0.0
ret float %r		ret float %r
}		}

define float @fsub_negzero_nsz(float %x) {		define float @fsub_negzero_nsz(float %x) {
; CHECK-LABEL: fsub_negzero_nsz:		; CHECK-LABEL: fsub_negzero_nsz:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%r = fsub nsz float %x, -0.0		%r = fsub nsz float %x, -0.0
ret float %r		ret float %r
}		}

define <4 x float> @fsub_negzero_strict_vector(<4 x float> %x) {		define <4 x float> @fsub_negzero_strict_vector(<4 x float> %x) {
; CHECK-LABEL: fsub_negzero_strict_vector:		; CHECK-LABEL: fsub_negzero_strict_vector:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: xorps %xmm1, %xmm1		; CHECK-NEXT: subps {{.*}}(%rip), %xmm0
; CHECK-NEXT: addps %xmm1, %xmm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%r = fsub <4 x float> %x, <float -0.0, float -0.0, float -0.0, float -0.0>		%r = fsub <4 x float> %x, <float -0.0, float -0.0, float -0.0, float -0.0>
ret <4 x float> %r		ret <4 x float> %r
}		}

define <4 x float> @fsub_negzero_nsz_vector(<4 x float> %x) {		define <4 x float> @fsub_negzero_nsz_vector(<4 x float> %x) {
; CHECK-LABEL: fsub_negzero_nsz_vector:		; CHECK-LABEL: fsub_negzero_nsz_vector:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fp_constant_op.ll

	Show All 15 Lines
	; CHECK-LABEL: foo_mul:			; CHECK-LABEL: foo_mul:
	; CHECK: fmul dword ptr			; CHECK: fmul dword ptr

	define double @foo_sub(double %P) {			define double @foo_sub(double %P) {
	%tmp.1 = fsub double %P, 1.230000e+02 ; <double> [#uses=1]			%tmp.1 = fsub double %P, 1.230000e+02 ; <double> [#uses=1]
	ret double %tmp.1			ret double %tmp.1
	}			}
	; CHECK-LABEL: foo_sub:			; CHECK-LABEL: foo_sub:
	; CHECK: fadd dword ptr			; CHECK: fsub dword ptr

	define double @foo_subr(double %P) {			define double @foo_subr(double %P) {
	%tmp.1 = fsub double 1.230000e+02, %P ; <double> [#uses=1]			%tmp.1 = fsub double 1.230000e+02, %P ; <double> [#uses=1]
	ret double %tmp.1			ret double %tmp.1
	}			}
	; CHECK-LABEL: foo_subr:			; CHECK-LABEL: foo_subr:
	; CHECK: fsub qword ptr			; CHECK: fsub qword ptr

	Show All 14 Lines

llvm/test/CodeGen/X86/limited-prec.ll

	Show First 20 Lines • Show All 321 Lines • ▼ Show 20 Lines
	; precision6-NEXT: shrl $23, %eax			; precision6-NEXT: shrl $23, %eax
	; precision6-NEXT: addl $-127, %eax			; precision6-NEXT: addl $-127, %eax
	; precision6-NEXT: movl %eax, {{[0-9]+}}(%esp)			; precision6-NEXT: movl %eax, {{[0-9]+}}(%esp)
	; precision6-NEXT: flds (%esp)			; precision6-NEXT: flds (%esp)
	; precision6-NEXT: fld %st(0)			; precision6-NEXT: fld %st(0)
	; precision6-NEXT: fmuls {{\.LCPI.*}}			; precision6-NEXT: fmuls {{\.LCPI.*}}
	; precision6-NEXT: fadds {{\.LCPI.*}}			; precision6-NEXT: fadds {{\.LCPI.*}}
	; precision6-NEXT: fmulp %st, %st(1)			; precision6-NEXT: fmulp %st, %st(1)
	; precision6-NEXT: fadds {{\.LCPI.*}}			; precision6-NEXT: fsubs {{\.LCPI.*}}
	; precision6-NEXT: fildl {{[0-9]+}}(%esp)			; precision6-NEXT: fildl {{[0-9]+}}(%esp)
	; precision6-NEXT: fmuls {{\.LCPI.*}}			; precision6-NEXT: fmuls {{\.LCPI.*}}
	; precision6-NEXT: faddp %st, %st(1)			; precision6-NEXT: faddp %st, %st(1)
	; precision6-NEXT: addl $8, %esp			; precision6-NEXT: addl $8, %esp
	; precision6-NEXT: retl			; precision6-NEXT: retl
	;			;
	; precision12-LABEL: f4:			; precision12-LABEL: f4:
	; precision12: # %bb.0: # %entry			; precision12: # %bb.0: # %entry
	; precision12-NEXT: subl $8, %esp			; precision12-NEXT: subl $8, %esp
	; precision12-NEXT: movl {{[0-9]+}}(%esp), %eax			; precision12-NEXT: movl {{[0-9]+}}(%esp), %eax
	; precision12-NEXT: movl %eax, %ecx			; precision12-NEXT: movl %eax, %ecx
	; precision12-NEXT: andl $8388607, %ecx # imm = 0x7FFFFF			; precision12-NEXT: andl $8388607, %ecx # imm = 0x7FFFFF
	; precision12-NEXT: orl $1065353216, %ecx # imm = 0x3F800000			; precision12-NEXT: orl $1065353216, %ecx # imm = 0x3F800000
	; precision12-NEXT: movl %ecx, (%esp)			; precision12-NEXT: movl %ecx, (%esp)
	; precision12-NEXT: andl $2139095040, %eax # imm = 0x7F800000			; precision12-NEXT: andl $2139095040, %eax # imm = 0x7F800000
	; precision12-NEXT: shrl $23, %eax			; precision12-NEXT: shrl $23, %eax
	; precision12-NEXT: addl $-127, %eax			; precision12-NEXT: addl $-127, %eax
	; precision12-NEXT: movl %eax, {{[0-9]+}}(%esp)			; precision12-NEXT: movl %eax, {{[0-9]+}}(%esp)
	; precision12-NEXT: flds (%esp)			; precision12-NEXT: flds (%esp)
	; precision12-NEXT: fld %st(0)			; precision12-NEXT: fld %st(0)
	; precision12-NEXT: fmuls {{\.LCPI.*}}			; precision12-NEXT: fmuls {{\.LCPI.*}}
	; precision12-NEXT: fadds {{\.LCPI.*}}			; precision12-NEXT: fadds {{\.LCPI.*}}
	; precision12-NEXT: fmul %st(1), %st			; precision12-NEXT: fmul %st(1), %st
	; precision12-NEXT: fadds {{\.LCPI.*}}			; precision12-NEXT: fsubs {{\.LCPI.*}}
	; precision12-NEXT: fmul %st(1), %st			; precision12-NEXT: fmul %st(1), %st
	; precision12-NEXT: fadds {{\.LCPI.*}}			; precision12-NEXT: fadds {{\.LCPI.*}}
	; precision12-NEXT: fmulp %st, %st(1)			; precision12-NEXT: fmulp %st, %st(1)
	; precision12-NEXT: fadds {{\.LCPI.*}}			; precision12-NEXT: fsubs {{\.LCPI.*}}
	; precision12-NEXT: fildl {{[0-9]+}}(%esp)			; precision12-NEXT: fildl {{[0-9]+}}(%esp)
	; precision12-NEXT: fmuls {{\.LCPI.*}}			; precision12-NEXT: fmuls {{\.LCPI.*}}
	; precision12-NEXT: faddp %st, %st(1)			; precision12-NEXT: faddp %st, %st(1)
	; precision12-NEXT: addl $8, %esp			; precision12-NEXT: addl $8, %esp
	; precision12-NEXT: retl			; precision12-NEXT: retl
	;			;
	; precision18-LABEL: f4:			; precision18-LABEL: f4:
	; precision18: # %bb.0: # %entry			; precision18: # %bb.0: # %entry
	; precision18-NEXT: subl $8, %esp			; precision18-NEXT: subl $8, %esp
	; precision18-NEXT: movl {{[0-9]+}}(%esp), %eax			; precision18-NEXT: movl {{[0-9]+}}(%esp), %eax
	; precision18-NEXT: movl %eax, %ecx			; precision18-NEXT: movl %eax, %ecx
	; precision18-NEXT: andl $8388607, %ecx # imm = 0x7FFFFF			; precision18-NEXT: andl $8388607, %ecx # imm = 0x7FFFFF
	; precision18-NEXT: orl $1065353216, %ecx # imm = 0x3F800000			; precision18-NEXT: orl $1065353216, %ecx # imm = 0x3F800000
	; precision18-NEXT: movl %ecx, (%esp)			; precision18-NEXT: movl %ecx, (%esp)
	; precision18-NEXT: andl $2139095040, %eax # imm = 0x7F800000			; precision18-NEXT: andl $2139095040, %eax # imm = 0x7F800000
	; precision18-NEXT: shrl $23, %eax			; precision18-NEXT: shrl $23, %eax
	; precision18-NEXT: addl $-127, %eax			; precision18-NEXT: addl $-127, %eax
	; precision18-NEXT: movl %eax, {{[0-9]+}}(%esp)			; precision18-NEXT: movl %eax, {{[0-9]+}}(%esp)
	; precision18-NEXT: flds (%esp)			; precision18-NEXT: flds (%esp)
	; precision18-NEXT: fld %st(0)			; precision18-NEXT: fld %st(0)
	; precision18-NEXT: fmuls {{\.LCPI.*}}			; precision18-NEXT: fmuls {{\.LCPI.*}}
	; precision18-NEXT: fadds {{\.LCPI.*}}			; precision18-NEXT: fadds {{\.LCPI.*}}
	; precision18-NEXT: fmul %st(1), %st			; precision18-NEXT: fmul %st(1), %st
	; precision18-NEXT: fadds {{\.LCPI.*}}			; precision18-NEXT: fsubs {{\.LCPI.*}}
	; precision18-NEXT: fmul %st(1), %st			; precision18-NEXT: fmul %st(1), %st
	; precision18-NEXT: fadds {{\.LCPI.*}}			; precision18-NEXT: fadds {{\.LCPI.*}}
	; precision18-NEXT: fmul %st(1), %st			; precision18-NEXT: fmul %st(1), %st
	; precision18-NEXT: fadds {{\.LCPI.*}}			; precision18-NEXT: fsubs {{\.LCPI.*}}
	; precision18-NEXT: fmul %st(1), %st			; precision18-NEXT: fmul %st(1), %st
	; precision18-NEXT: fadds {{\.LCPI.*}}			; precision18-NEXT: fadds {{\.LCPI.*}}
	; precision18-NEXT: fmulp %st, %st(1)			; precision18-NEXT: fmulp %st, %st(1)
	; precision18-NEXT: fadds {{\.LCPI.*}}			; precision18-NEXT: fsubs {{\.LCPI.*}}
	; precision18-NEXT: fildl {{[0-9]+}}(%esp)			; precision18-NEXT: fildl {{[0-9]+}}(%esp)
	; precision18-NEXT: fmuls {{\.LCPI.*}}			; precision18-NEXT: fmuls {{\.LCPI.*}}
	; precision18-NEXT: faddp %st, %st(1)			; precision18-NEXT: faddp %st, %st(1)
	; precision18-NEXT: addl $8, %esp			; precision18-NEXT: addl $8, %esp
	; precision18-NEXT: retl			; precision18-NEXT: retl
	entry:			entry:
	%"alloca point" = bitcast i32 0 to i32 ; <i32> [#uses=0]			%"alloca point" = bitcast i32 0 to i32 ; <i32> [#uses=0]
	%0 = call float @llvm.log.f32(float %x) ; <float> [#uses=1]			%0 = call float @llvm.log.f32(float %x) ; <float> [#uses=1]
	Show All 15 Lines
	; precision6-NEXT: shrl $23, %eax			; precision6-NEXT: shrl $23, %eax
	; precision6-NEXT: addl $-127, %eax			; precision6-NEXT: addl $-127, %eax
	; precision6-NEXT: movl %eax, {{[0-9]+}}(%esp)			; precision6-NEXT: movl %eax, {{[0-9]+}}(%esp)
	; precision6-NEXT: flds (%esp)			; precision6-NEXT: flds (%esp)
	; precision6-NEXT: fld %st(0)			; precision6-NEXT: fld %st(0)
	; precision6-NEXT: fmuls {{\.LCPI.*}}			; precision6-NEXT: fmuls {{\.LCPI.*}}
	; precision6-NEXT: fadds {{\.LCPI.*}}			; precision6-NEXT: fadds {{\.LCPI.*}}
	; precision6-NEXT: fmulp %st, %st(1)			; precision6-NEXT: fmulp %st, %st(1)
	; precision6-NEXT: fadds {{\.LCPI.*}}			; precision6-NEXT: fsubs {{\.LCPI.*}}
	; precision6-NEXT: fiaddl {{[0-9]+}}(%esp)			; precision6-NEXT: fiaddl {{[0-9]+}}(%esp)
	; precision6-NEXT: addl $8, %esp			; precision6-NEXT: addl $8, %esp
	; precision6-NEXT: retl			; precision6-NEXT: retl
	;			;
	; precision12-LABEL: f5:			; precision12-LABEL: f5:
	; precision12: # %bb.0: # %entry			; precision12: # %bb.0: # %entry
	; precision12-NEXT: subl $8, %esp			; precision12-NEXT: subl $8, %esp
	; precision12-NEXT: movl {{[0-9]+}}(%esp), %eax			; precision12-NEXT: movl {{[0-9]+}}(%esp), %eax
	; precision12-NEXT: movl %eax, %ecx			; precision12-NEXT: movl %eax, %ecx
	; precision12-NEXT: andl $8388607, %ecx # imm = 0x7FFFFF			; precision12-NEXT: andl $8388607, %ecx # imm = 0x7FFFFF
	; precision12-NEXT: orl $1065353216, %ecx # imm = 0x3F800000			; precision12-NEXT: orl $1065353216, %ecx # imm = 0x3F800000
	; precision12-NEXT: movl %ecx, (%esp)			; precision12-NEXT: movl %ecx, (%esp)
	; precision12-NEXT: andl $2139095040, %eax # imm = 0x7F800000			; precision12-NEXT: andl $2139095040, %eax # imm = 0x7F800000
	; precision12-NEXT: shrl $23, %eax			; precision12-NEXT: shrl $23, %eax
	; precision12-NEXT: addl $-127, %eax			; precision12-NEXT: addl $-127, %eax
	; precision12-NEXT: movl %eax, {{[0-9]+}}(%esp)			; precision12-NEXT: movl %eax, {{[0-9]+}}(%esp)
	; precision12-NEXT: flds (%esp)			; precision12-NEXT: flds (%esp)
	; precision12-NEXT: fld %st(0)			; precision12-NEXT: fld %st(0)
	; precision12-NEXT: fmuls {{\.LCPI.*}}			; precision12-NEXT: fmuls {{\.LCPI.*}}
	; precision12-NEXT: fadds {{\.LCPI.*}}			; precision12-NEXT: fadds {{\.LCPI.*}}
	; precision12-NEXT: fmul %st(1), %st			; precision12-NEXT: fmul %st(1), %st
	; precision12-NEXT: fadds {{\.LCPI.*}}			; precision12-NEXT: fsubs {{\.LCPI.*}}
	; precision12-NEXT: fmul %st(1), %st			; precision12-NEXT: fmul %st(1), %st
	; precision12-NEXT: fadds {{\.LCPI.*}}			; precision12-NEXT: fadds {{\.LCPI.*}}
	; precision12-NEXT: fmulp %st, %st(1)			; precision12-NEXT: fmulp %st, %st(1)
	; precision12-NEXT: fadds {{\.LCPI.*}}			; precision12-NEXT: fsubs {{\.LCPI.*}}
	; precision12-NEXT: fiaddl {{[0-9]+}}(%esp)			; precision12-NEXT: fiaddl {{[0-9]+}}(%esp)
	; precision12-NEXT: addl $8, %esp			; precision12-NEXT: addl $8, %esp
	; precision12-NEXT: retl			; precision12-NEXT: retl
	;			;
	; precision18-LABEL: f5:			; precision18-LABEL: f5:
	; precision18: # %bb.0: # %entry			; precision18: # %bb.0: # %entry
	; precision18-NEXT: subl $8, %esp			; precision18-NEXT: subl $8, %esp
	; precision18-NEXT: movl {{[0-9]+}}(%esp), %eax			; precision18-NEXT: movl {{[0-9]+}}(%esp), %eax
	; precision18-NEXT: movl %eax, %ecx			; precision18-NEXT: movl %eax, %ecx
	; precision18-NEXT: andl $8388607, %ecx # imm = 0x7FFFFF			; precision18-NEXT: andl $8388607, %ecx # imm = 0x7FFFFF
	; precision18-NEXT: orl $1065353216, %ecx # imm = 0x3F800000			; precision18-NEXT: orl $1065353216, %ecx # imm = 0x3F800000
	; precision18-NEXT: movl %ecx, (%esp)			; precision18-NEXT: movl %ecx, (%esp)
	; precision18-NEXT: andl $2139095040, %eax # imm = 0x7F800000			; precision18-NEXT: andl $2139095040, %eax # imm = 0x7F800000
	; precision18-NEXT: shrl $23, %eax			; precision18-NEXT: shrl $23, %eax
	; precision18-NEXT: addl $-127, %eax			; precision18-NEXT: addl $-127, %eax
	; precision18-NEXT: movl %eax, {{[0-9]+}}(%esp)			; precision18-NEXT: movl %eax, {{[0-9]+}}(%esp)
	; precision18-NEXT: flds (%esp)			; precision18-NEXT: flds (%esp)
	; precision18-NEXT: fld %st(0)			; precision18-NEXT: fld %st(0)
	; precision18-NEXT: fmuls {{\.LCPI.*}}			; precision18-NEXT: fmuls {{\.LCPI.*}}
	; precision18-NEXT: fadds {{\.LCPI.*}}			; precision18-NEXT: fadds {{\.LCPI.*}}
	; precision18-NEXT: fmul %st(1), %st			; precision18-NEXT: fmul %st(1), %st
	; precision18-NEXT: fadds {{\.LCPI.*}}			; precision18-NEXT: fsubs {{\.LCPI.*}}
	; precision18-NEXT: fmul %st(1), %st			; precision18-NEXT: fmul %st(1), %st
	; precision18-NEXT: fadds {{\.LCPI.*}}			; precision18-NEXT: fadds {{\.LCPI.*}}
	; precision18-NEXT: fmul %st(1), %st			; precision18-NEXT: fmul %st(1), %st
	; precision18-NEXT: fadds {{\.LCPI.*}}			; precision18-NEXT: fsubs {{\.LCPI.*}}
	; precision18-NEXT: fmul %st(1), %st			; precision18-NEXT: fmul %st(1), %st
	; precision18-NEXT: fadds {{\.LCPI.*}}			; precision18-NEXT: fadds {{\.LCPI.*}}
	; precision18-NEXT: fmulp %st, %st(1)			; precision18-NEXT: fmulp %st, %st(1)
	; precision18-NEXT: fadds {{\.LCPI.*}}			; precision18-NEXT: fsubs {{\.LCPI.*}}
	; precision18-NEXT: fiaddl {{[0-9]+}}(%esp)			; precision18-NEXT: fiaddl {{[0-9]+}}(%esp)
	; precision18-NEXT: addl $8, %esp			; precision18-NEXT: addl $8, %esp
	; precision18-NEXT: retl			; precision18-NEXT: retl
	entry:			entry:
	%"alloca point" = bitcast i32 0 to i32 ; <i32> [#uses=0]			%"alloca point" = bitcast i32 0 to i32 ; <i32> [#uses=0]
	%0 = call float @llvm.log2.f32(float %x) ; <float> [#uses=1]			%0 = call float @llvm.log2.f32(float %x) ; <float> [#uses=1]
	ret float %0			ret float %0
	}			}
	Show All 13 Lines
	; precision6-NEXT: shrl $23, %eax			; precision6-NEXT: shrl $23, %eax
	; precision6-NEXT: addl $-127, %eax			; precision6-NEXT: addl $-127, %eax
	; precision6-NEXT: movl %eax, {{[0-9]+}}(%esp)			; precision6-NEXT: movl %eax, {{[0-9]+}}(%esp)
	; precision6-NEXT: flds (%esp)			; precision6-NEXT: flds (%esp)
	; precision6-NEXT: fld %st(0)			; precision6-NEXT: fld %st(0)
	; precision6-NEXT: fmuls {{\.LCPI.*}}			; precision6-NEXT: fmuls {{\.LCPI.*}}
	; precision6-NEXT: fadds {{\.LCPI.*}}			; precision6-NEXT: fadds {{\.LCPI.*}}
	; precision6-NEXT: fmulp %st, %st(1)			; precision6-NEXT: fmulp %st, %st(1)
	; precision6-NEXT: fadds {{\.LCPI.*}}			; precision6-NEXT: fsubs {{\.LCPI.*}}
	; precision6-NEXT: fildl {{[0-9]+}}(%esp)			; precision6-NEXT: fildl {{[0-9]+}}(%esp)
	; precision6-NEXT: fmuls {{\.LCPI.*}}			; precision6-NEXT: fmuls {{\.LCPI.*}}
	; precision6-NEXT: faddp %st, %st(1)			; precision6-NEXT: faddp %st, %st(1)
	; precision6-NEXT: addl $8, %esp			; precision6-NEXT: addl $8, %esp
	; precision6-NEXT: retl			; precision6-NEXT: retl
	;			;
	; precision12-LABEL: f6:			; precision12-LABEL: f6:
	; precision12: # %bb.0: # %entry			; precision12: # %bb.0: # %entry
	; precision12-NEXT: subl $8, %esp			; precision12-NEXT: subl $8, %esp
	; precision12-NEXT: movl {{[0-9]+}}(%esp), %eax			; precision12-NEXT: movl {{[0-9]+}}(%esp), %eax
	; precision12-NEXT: movl %eax, %ecx			; precision12-NEXT: movl %eax, %ecx
	; precision12-NEXT: andl $8388607, %ecx # imm = 0x7FFFFF			; precision12-NEXT: andl $8388607, %ecx # imm = 0x7FFFFF
	; precision12-NEXT: orl $1065353216, %ecx # imm = 0x3F800000			; precision12-NEXT: orl $1065353216, %ecx # imm = 0x3F800000
	; precision12-NEXT: movl %ecx, (%esp)			; precision12-NEXT: movl %ecx, (%esp)
	; precision12-NEXT: andl $2139095040, %eax # imm = 0x7F800000			; precision12-NEXT: andl $2139095040, %eax # imm = 0x7F800000
	; precision12-NEXT: shrl $23, %eax			; precision12-NEXT: shrl $23, %eax
	; precision12-NEXT: addl $-127, %eax			; precision12-NEXT: addl $-127, %eax
	; precision12-NEXT: movl %eax, {{[0-9]+}}(%esp)			; precision12-NEXT: movl %eax, {{[0-9]+}}(%esp)
	; precision12-NEXT: flds (%esp)			; precision12-NEXT: flds (%esp)
	; precision12-NEXT: fld %st(0)			; precision12-NEXT: fld %st(0)
	; precision12-NEXT: fmuls {{\.LCPI.*}}			; precision12-NEXT: fmuls {{\.LCPI.*}}
	; precision12-NEXT: fadds {{\.LCPI.*}}			; precision12-NEXT: fsubs {{\.LCPI.*}}
	; precision12-NEXT: fmul %st(1), %st			; precision12-NEXT: fmul %st(1), %st
	; precision12-NEXT: fadds {{\.LCPI.*}}			; precision12-NEXT: fadds {{\.LCPI.*}}
	; precision12-NEXT: fmulp %st, %st(1)			; precision12-NEXT: fmulp %st, %st(1)
	; precision12-NEXT: fadds {{\.LCPI.*}}			; precision12-NEXT: fsubs {{\.LCPI.*}}
	; precision12-NEXT: fildl {{[0-9]+}}(%esp)			; precision12-NEXT: fildl {{[0-9]+}}(%esp)
	; precision12-NEXT: fmuls {{\.LCPI.*}}			; precision12-NEXT: fmuls {{\.LCPI.*}}
	; precision12-NEXT: faddp %st, %st(1)			; precision12-NEXT: faddp %st, %st(1)
	; precision12-NEXT: addl $8, %esp			; precision12-NEXT: addl $8, %esp
	; precision12-NEXT: retl			; precision12-NEXT: retl
	;			;
	; precision18-LABEL: f6:			; precision18-LABEL: f6:
	; precision18: # %bb.0: # %entry			; precision18: # %bb.0: # %entry
	; precision18-NEXT: subl $8, %esp			; precision18-NEXT: subl $8, %esp
	; precision18-NEXT: movl {{[0-9]+}}(%esp), %eax			; precision18-NEXT: movl {{[0-9]+}}(%esp), %eax
	; precision18-NEXT: movl %eax, %ecx			; precision18-NEXT: movl %eax, %ecx
	; precision18-NEXT: andl $8388607, %ecx # imm = 0x7FFFFF			; precision18-NEXT: andl $8388607, %ecx # imm = 0x7FFFFF
	; precision18-NEXT: orl $1065353216, %ecx # imm = 0x3F800000			; precision18-NEXT: orl $1065353216, %ecx # imm = 0x3F800000
	; precision18-NEXT: movl %ecx, (%esp)			; precision18-NEXT: movl %ecx, (%esp)
	; precision18-NEXT: andl $2139095040, %eax # imm = 0x7F800000			; precision18-NEXT: andl $2139095040, %eax # imm = 0x7F800000
	; precision18-NEXT: shrl $23, %eax			; precision18-NEXT: shrl $23, %eax
	; precision18-NEXT: addl $-127, %eax			; precision18-NEXT: addl $-127, %eax
	; precision18-NEXT: movl %eax, {{[0-9]+}}(%esp)			; precision18-NEXT: movl %eax, {{[0-9]+}}(%esp)
	; precision18-NEXT: flds (%esp)			; precision18-NEXT: flds (%esp)
	; precision18-NEXT: fld %st(0)			; precision18-NEXT: fld %st(0)
	; precision18-NEXT: fmuls {{\.LCPI.*}}			; precision18-NEXT: fmuls {{\.LCPI.*}}
	; precision18-NEXT: fadds {{\.LCPI.*}}			; precision18-NEXT: fsubs {{\.LCPI.*}}
	; precision18-NEXT: fmul %st(1), %st			; precision18-NEXT: fmul %st(1), %st
	; precision18-NEXT: fadds {{\.LCPI.*}}			; precision18-NEXT: fadds {{\.LCPI.*}}
	; precision18-NEXT: fmul %st(1), %st			; precision18-NEXT: fmul %st(1), %st
	; precision18-NEXT: fadds {{\.LCPI.*}}			; precision18-NEXT: fsubs {{\.LCPI.*}}
	; precision18-NEXT: fmul %st(1), %st			; precision18-NEXT: fmul %st(1), %st
	; precision18-NEXT: fadds {{\.LCPI.*}}			; precision18-NEXT: fadds {{\.LCPI.*}}
	; precision18-NEXT: fmulp %st, %st(1)			; precision18-NEXT: fmulp %st, %st(1)
	; precision18-NEXT: fadds {{\.LCPI.*}}			; precision18-NEXT: fsubs {{\.LCPI.*}}
	; precision18-NEXT: fildl {{[0-9]+}}(%esp)			; precision18-NEXT: fildl {{[0-9]+}}(%esp)
	; precision18-NEXT: fmuls {{\.LCPI.*}}			; precision18-NEXT: fmuls {{\.LCPI.*}}
	; precision18-NEXT: faddp %st, %st(1)			; precision18-NEXT: faddp %st, %st(1)
	; precision18-NEXT: addl $8, %esp			; precision18-NEXT: addl $8, %esp
	; precision18-NEXT: retl			; precision18-NEXT: retl
	entry:			entry:
	%"alloca point" = bitcast i32 0 to i32 ; <i32> [#uses=0]			%"alloca point" = bitcast i32 0 to i32 ; <i32> [#uses=0]
	%0 = call float @llvm.log10.f32(float %x) ; <float> [#uses=1]			%0 = call float @llvm.log10.f32(float %x) ; <float> [#uses=1]
	ret float %0			ret float %0
	}			}

	declare float @llvm.log10.f32(float) nounwind readonly			declare float @llvm.log10.f32(float) nounwind readonly

llvm/test/CodeGen/X86/load-scalar-as-vector.ll

Show First 20 Lines • Show All 567 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
%r = insertelement <4 x float> undef, float %b, i32 0		%r = insertelement <4 x float> undef, float %b, i32 0
ret <4 x float> %r		ret <4 x float> %r
}		}

define <2 x double> @fsub_op1_constant(double* %p) nounwind {		define <2 x double> @fsub_op1_constant(double* %p) nounwind {
; SSE-LABEL: fsub_op1_constant:		; SSE-LABEL: fsub_op1_constant:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero		; SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
; SSE-NEXT: addsd {{.*}}(%rip), %xmm0		; SSE-NEXT: subsd {{.*}}(%rip), %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: fsub_op1_constant:		; AVX-LABEL: fsub_op1_constant:
; AVX: # %bb.0:		; AVX: # %bb.0:
; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero		; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; AVX-NEXT: vaddsd {{.*}}(%rip), %xmm0, %xmm0		; AVX-NEXT: vsubsd {{.*}}(%rip), %xmm0, %xmm0
; AVX-NEXT: retq		; AVX-NEXT: retq
%x = load double, double* %p		%x = load double, double* %p
%b = fsub double %x, 42.0		%b = fsub double %x, 42.0
%r = insertelement <2 x double> undef, double %b, i32 0		%r = insertelement <2 x double> undef, double %b, i32 0
ret <2 x double> %r		ret <2 x double> %r
}		}

define <4 x float> @fsub_op0_constant(float* %p) nounwind {		define <4 x float> @fsub_op0_constant(float* %p) nounwind {
▲ Show 20 Lines • Show All 279 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/negative-sin.ll

	Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	}			}

	; The 2nd negate is strict, so we can't kill it. It becomes an add of zero instead.			; The 2nd negate is strict, so we can't kill it. It becomes an add of zero instead.

	define double @semi_strict2(double %e) nounwind {			define double @semi_strict2(double %e) nounwind {
	; CHECK-LABEL: semi_strict2:			; CHECK-LABEL: semi_strict2:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: pushq %rax			; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: vxorpd {{.*}}(%rip), %xmm0, %xmm0
	; CHECK-NEXT: callq sin			; CHECK-NEXT: callq sin
	; CHECK-NEXT: vxorpd %xmm1, %xmm1, %xmm1			; CHECK-NEXT: vxorpd %xmm1, %xmm1, %xmm1
	; CHECK-NEXT: vaddsd %xmm1, %xmm0, %xmm0			; CHECK-NEXT: vsubsd %xmm0, %xmm1, %xmm0
	; CHECK-NEXT: popq %rax			; CHECK-NEXT: popq %rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%f = fsub nsz double 0.0, %e			%f = fsub nsz double 0.0, %e
	%g = call double @sin(double %f) readonly			%g = call double @sin(double %f) readonly
	%h = fsub double 0.0, %g			%h = fsub double 0.0, %g
	ret double %h			ret double %h
	}			}

	Show All 15 Lines

llvm/test/CodeGen/X86/pr44749.ll

	Show All 24 Lines
	; CHECK-NEXT: ## %bb.1: ## %entry			; CHECK-NEXT: ## %bb.1: ## %entry
	; CHECK-NEXT: movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 ## 8-byte Reload			; CHECK-NEXT: movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 ## 8-byte Reload
	; CHECK-NEXT: ## xmm0 = mem[0],zero			; CHECK-NEXT: ## xmm0 = mem[0],zero
	; CHECK-NEXT: movsd %xmm0, (%rsp) ## 8-byte Spill			; CHECK-NEXT: movsd %xmm0, (%rsp) ## 8-byte Spill
	; CHECK-NEXT: LBB0_2: ## %entry			; CHECK-NEXT: LBB0_2: ## %entry
	; CHECK-NEXT: movsd (%rsp), %xmm0 ## 8-byte Reload			; CHECK-NEXT: movsd (%rsp), %xmm0 ## 8-byte Reload
	; CHECK-NEXT: ## xmm0 = mem[0],zero			; CHECK-NEXT: ## xmm0 = mem[0],zero
	; CHECK-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero			; CHECK-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
	; CHECK-NEXT: addsd %xmm1, %xmm0			; CHECK-NEXT: subsd %xmm1, %xmm0
	; CHECK-NEXT: movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 ## 8-byte Reload			; CHECK-NEXT: movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 ## 8-byte Reload
	; CHECK-NEXT: ## xmm1 = mem[0],zero			; CHECK-NEXT: ## xmm1 = mem[0],zero
	; CHECK-NEXT: ucomisd %xmm0, %xmm1			; CHECK-NEXT: ucomisd %xmm0, %xmm1
	; CHECK-NEXT: setae %al			; CHECK-NEXT: setae %al
	; CHECK-NEXT: movzbl %al, %ecx			; CHECK-NEXT: movzbl %al, %ecx
	; CHECK-NEXT: movl %ecx, %edx			; CHECK-NEXT: movl %ecx, %edx
	; CHECK-NEXT: leaq {{.*}}(%rip), %rsi			; CHECK-NEXT: leaq {{.*}}(%rip), %rsi
	; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	Show All 15 Lines

llvm/test/CodeGen/X86/vec_ss_load_fold.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -disable-peephole -mtriple=i686-apple-darwin9 -mattr=+sse,+sse2,+sse4.1 \| FileCheck %s --check-prefix=X32		; RUN: llc < %s -disable-peephole -mtriple=i686-apple-darwin9 -mattr=+sse,+sse2,+sse4.1 \| FileCheck %s --check-prefix=X32
; RUN: llc < %s -disable-peephole -mtriple=x86_64-apple-darwin9 -mattr=+sse,+sse2,+sse4.1 \| FileCheck %s --check-prefix=X64		; RUN: llc < %s -disable-peephole -mtriple=x86_64-apple-darwin9 -mattr=+sse,+sse2,+sse4.1 \| FileCheck %s --check-prefix=X64
; RUN: llc < %s -disable-peephole -mtriple=i686-apple-darwin9 -mattr=+avx \| FileCheck %s --check-prefix=X32_AVX --check-prefix=X32_AVX1		; RUN: llc < %s -disable-peephole -mtriple=i686-apple-darwin9 -mattr=+avx \| FileCheck %s --check-prefix=X32_AVX --check-prefix=X32_AVX1
; RUN: llc < %s -disable-peephole -mtriple=x86_64-apple-darwin9 -mattr=+avx \| FileCheck %s --check-prefix=X64_AVX --check-prefix=X64_AVX1		; RUN: llc < %s -disable-peephole -mtriple=x86_64-apple-darwin9 -mattr=+avx \| FileCheck %s --check-prefix=X64_AVX --check-prefix=X64_AVX1
; RUN: llc < %s -disable-peephole -mtriple=i686-apple-darwin9 -mattr=+avx512f \| FileCheck %s --check-prefix=X32_AVX --check-prefix=X32_AVX512		; RUN: llc < %s -disable-peephole -mtriple=i686-apple-darwin9 -mattr=+avx512f \| FileCheck %s --check-prefix=X32_AVX --check-prefix=X32_AVX512
; RUN: llc < %s -disable-peephole -mtriple=x86_64-apple-darwin9 -mattr=+avx512f \| FileCheck %s --check-prefix=X64_AVX --check-prefix=X64_AVX512		; RUN: llc < %s -disable-peephole -mtriple=x86_64-apple-darwin9 -mattr=+avx512f \| FileCheck %s --check-prefix=X64_AVX --check-prefix=X64_AVX512

define i16 @test1(float %f) nounwind {		define i16 @test1(float %f) nounwind {
; X32-LABEL: test1:		; X32-LABEL: test1:
; X32: ## %bb.0:		; X32: ## %bb.0:
; X32-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero		; X32-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; X32-NEXT: addss LCPI0_0, %xmm0		; X32-NEXT: subss LCPI0_0, %xmm0
; X32-NEXT: mulss LCPI0_1, %xmm0		; X32-NEXT: mulss LCPI0_1, %xmm0
; X32-NEXT: xorps %xmm1, %xmm1		; X32-NEXT: xorps %xmm1, %xmm1
; X32-NEXT: blendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]		; X32-NEXT: blendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
; X32-NEXT: minss LCPI0_2, %xmm0		; X32-NEXT: minss LCPI0_2, %xmm0
; X32-NEXT: maxss %xmm1, %xmm0		; X32-NEXT: maxss %xmm1, %xmm0
; X32-NEXT: cvttss2si %xmm0, %eax		; X32-NEXT: cvttss2si %xmm0, %eax
; X32-NEXT: ## kill: def $ax killed $ax killed $eax		; X32-NEXT: ## kill: def $ax killed $ax killed $eax
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test1:		; X64-LABEL: test1:
; X64: ## %bb.0:		; X64: ## %bb.0:
; X64-NEXT: addss {{.*}}(%rip), %xmm0		; X64-NEXT: subss {{.*}}(%rip), %xmm0
; X64-NEXT: mulss {{.*}}(%rip), %xmm0		; X64-NEXT: mulss {{.*}}(%rip), %xmm0
; X64-NEXT: xorps %xmm1, %xmm1		; X64-NEXT: xorps %xmm1, %xmm1
; X64-NEXT: blendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]		; X64-NEXT: blendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
; X64-NEXT: minss {{.*}}(%rip), %xmm0		; X64-NEXT: minss {{.*}}(%rip), %xmm0
; X64-NEXT: maxss %xmm1, %xmm0		; X64-NEXT: maxss %xmm1, %xmm0
; X64-NEXT: cvttss2si %xmm0, %eax		; X64-NEXT: cvttss2si %xmm0, %eax
; X64-NEXT: ## kill: def $ax killed $ax killed $eax		; X64-NEXT: ## kill: def $ax killed $ax killed $eax
; X64-NEXT: retq		; X64-NEXT: retq
;		;
; X32_AVX1-LABEL: test1:		; X32_AVX1-LABEL: test1:
; X32_AVX1: ## %bb.0:		; X32_AVX1: ## %bb.0:
; X32_AVX1-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero		; X32_AVX1-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; X32_AVX1-NEXT: vaddss LCPI0_0, %xmm0, %xmm0		; X32_AVX1-NEXT: vsubss LCPI0_0, %xmm0, %xmm0
; X32_AVX1-NEXT: vmulss LCPI0_1, %xmm0, %xmm0		; X32_AVX1-NEXT: vmulss LCPI0_1, %xmm0, %xmm0
; X32_AVX1-NEXT: vxorps %xmm1, %xmm1, %xmm1		; X32_AVX1-NEXT: vxorps %xmm1, %xmm1, %xmm1
; X32_AVX1-NEXT: vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]		; X32_AVX1-NEXT: vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
; X32_AVX1-NEXT: vminss LCPI0_2, %xmm0, %xmm0		; X32_AVX1-NEXT: vminss LCPI0_2, %xmm0, %xmm0
; X32_AVX1-NEXT: vmaxss %xmm1, %xmm0, %xmm0		; X32_AVX1-NEXT: vmaxss %xmm1, %xmm0, %xmm0
; X32_AVX1-NEXT: vcvttss2si %xmm0, %eax		; X32_AVX1-NEXT: vcvttss2si %xmm0, %eax
; X32_AVX1-NEXT: ## kill: def $ax killed $ax killed $eax		; X32_AVX1-NEXT: ## kill: def $ax killed $ax killed $eax
; X32_AVX1-NEXT: retl		; X32_AVX1-NEXT: retl
;		;
; X64_AVX1-LABEL: test1:		; X64_AVX1-LABEL: test1:
; X64_AVX1: ## %bb.0:		; X64_AVX1: ## %bb.0:
; X64_AVX1-NEXT: vaddss {{.*}}(%rip), %xmm0, %xmm0		; X64_AVX1-NEXT: vsubss {{.*}}(%rip), %xmm0, %xmm0
; X64_AVX1-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0		; X64_AVX1-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0
; X64_AVX1-NEXT: vxorps %xmm1, %xmm1, %xmm1		; X64_AVX1-NEXT: vxorps %xmm1, %xmm1, %xmm1
; X64_AVX1-NEXT: vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]		; X64_AVX1-NEXT: vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
; X64_AVX1-NEXT: vminss {{.*}}(%rip), %xmm0, %xmm0		; X64_AVX1-NEXT: vminss {{.*}}(%rip), %xmm0, %xmm0
; X64_AVX1-NEXT: vmaxss %xmm1, %xmm0, %xmm0		; X64_AVX1-NEXT: vmaxss %xmm1, %xmm0, %xmm0
; X64_AVX1-NEXT: vcvttss2si %xmm0, %eax		; X64_AVX1-NEXT: vcvttss2si %xmm0, %eax
; X64_AVX1-NEXT: ## kill: def $ax killed $ax killed $eax		; X64_AVX1-NEXT: ## kill: def $ax killed $ax killed $eax
; X64_AVX1-NEXT: retq		; X64_AVX1-NEXT: retq
;		;
; X32_AVX512-LABEL: test1:		; X32_AVX512-LABEL: test1:
; X32_AVX512: ## %bb.0:		; X32_AVX512: ## %bb.0:
; X32_AVX512-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero		; X32_AVX512-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; X32_AVX512-NEXT: vaddss LCPI0_0, %xmm0, %xmm0		; X32_AVX512-NEXT: vsubss LCPI0_0, %xmm0, %xmm0
; X32_AVX512-NEXT: vmulss LCPI0_1, %xmm0, %xmm0		; X32_AVX512-NEXT: vmulss LCPI0_1, %xmm0, %xmm0
; X32_AVX512-NEXT: vxorps %xmm1, %xmm1, %xmm1		; X32_AVX512-NEXT: vxorps %xmm1, %xmm1, %xmm1
; X32_AVX512-NEXT: vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]		; X32_AVX512-NEXT: vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
; X32_AVX512-NEXT: vminss LCPI0_2, %xmm0, %xmm0		; X32_AVX512-NEXT: vminss LCPI0_2, %xmm0, %xmm0
; X32_AVX512-NEXT: vxorps %xmm1, %xmm1, %xmm1		; X32_AVX512-NEXT: vxorps %xmm1, %xmm1, %xmm1
; X32_AVX512-NEXT: vmaxss %xmm1, %xmm0, %xmm0		; X32_AVX512-NEXT: vmaxss %xmm1, %xmm0, %xmm0
; X32_AVX512-NEXT: vcvttss2si %xmm0, %eax		; X32_AVX512-NEXT: vcvttss2si %xmm0, %eax
; X32_AVX512-NEXT: ## kill: def $ax killed $ax killed $eax		; X32_AVX512-NEXT: ## kill: def $ax killed $ax killed $eax
; X32_AVX512-NEXT: retl		; X32_AVX512-NEXT: retl
;		;
; X64_AVX512-LABEL: test1:		; X64_AVX512-LABEL: test1:
; X64_AVX512: ## %bb.0:		; X64_AVX512: ## %bb.0:
; X64_AVX512-NEXT: vaddss {{.*}}(%rip), %xmm0, %xmm0		; X64_AVX512-NEXT: vsubss {{.*}}(%rip), %xmm0, %xmm0
; X64_AVX512-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0		; X64_AVX512-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0
; X64_AVX512-NEXT: vxorps %xmm1, %xmm1, %xmm1		; X64_AVX512-NEXT: vxorps %xmm1, %xmm1, %xmm1
; X64_AVX512-NEXT: vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]		; X64_AVX512-NEXT: vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
; X64_AVX512-NEXT: vminss {{.*}}(%rip), %xmm0, %xmm0		; X64_AVX512-NEXT: vminss {{.*}}(%rip), %xmm0, %xmm0
; X64_AVX512-NEXT: vxorps %xmm1, %xmm1, %xmm1		; X64_AVX512-NEXT: vxorps %xmm1, %xmm1, %xmm1
; X64_AVX512-NEXT: vmaxss %xmm1, %xmm0, %xmm0		; X64_AVX512-NEXT: vmaxss %xmm1, %xmm0, %xmm0
; X64_AVX512-NEXT: vcvttss2si %xmm0, %eax		; X64_AVX512-NEXT: vcvttss2si %xmm0, %eax
; X64_AVX512-NEXT: ## kill: def $ax killed $ax killed $eax		; X64_AVX512-NEXT: ## kill: def $ax killed $ax killed $eax
Show All 10 Lines	; X64_AVX512-NEXT: retq
%tmp69 = trunc i32 %tmp.upgrd.1 to i16 ; <i16> [#uses=1]		%tmp69 = trunc i32 %tmp.upgrd.1 to i16 ; <i16> [#uses=1]
ret i16 %tmp69		ret i16 %tmp69
}		}

define i16 @test2(float %f) nounwind {		define i16 @test2(float %f) nounwind {
; X32-LABEL: test2:		; X32-LABEL: test2:
; X32: ## %bb.0:		; X32: ## %bb.0:
; X32-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero		; X32-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; X32-NEXT: addss LCPI1_0, %xmm0		; X32-NEXT: subss LCPI1_0, %xmm0
; X32-NEXT: mulss LCPI1_1, %xmm0		; X32-NEXT: mulss LCPI1_1, %xmm0
; X32-NEXT: minss LCPI1_2, %xmm0		; X32-NEXT: minss LCPI1_2, %xmm0
; X32-NEXT: xorps %xmm1, %xmm1		; X32-NEXT: xorps %xmm1, %xmm1
; X32-NEXT: maxss %xmm1, %xmm0		; X32-NEXT: maxss %xmm1, %xmm0
; X32-NEXT: cvttss2si %xmm0, %eax		; X32-NEXT: cvttss2si %xmm0, %eax
; X32-NEXT: ## kill: def $ax killed $ax killed $eax		; X32-NEXT: ## kill: def $ax killed $ax killed $eax
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test2:		; X64-LABEL: test2:
; X64: ## %bb.0:		; X64: ## %bb.0:
; X64-NEXT: addss {{.*}}(%rip), %xmm0		; X64-NEXT: subss {{.*}}(%rip), %xmm0
; X64-NEXT: mulss {{.*}}(%rip), %xmm0		; X64-NEXT: mulss {{.*}}(%rip), %xmm0
; X64-NEXT: minss {{.*}}(%rip), %xmm0		; X64-NEXT: minss {{.*}}(%rip), %xmm0
; X64-NEXT: xorps %xmm1, %xmm1		; X64-NEXT: xorps %xmm1, %xmm1
; X64-NEXT: maxss %xmm1, %xmm0		; X64-NEXT: maxss %xmm1, %xmm0
; X64-NEXT: cvttss2si %xmm0, %eax		; X64-NEXT: cvttss2si %xmm0, %eax
; X64-NEXT: ## kill: def $ax killed $ax killed $eax		; X64-NEXT: ## kill: def $ax killed $ax killed $eax
; X64-NEXT: retq		; X64-NEXT: retq
;		;
; X32_AVX-LABEL: test2:		; X32_AVX-LABEL: test2:
; X32_AVX: ## %bb.0:		; X32_AVX: ## %bb.0:
; X32_AVX-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero		; X32_AVX-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; X32_AVX-NEXT: vaddss LCPI1_0, %xmm0, %xmm0		; X32_AVX-NEXT: vsubss LCPI1_0, %xmm0, %xmm0
; X32_AVX-NEXT: vmulss LCPI1_1, %xmm0, %xmm0		; X32_AVX-NEXT: vmulss LCPI1_1, %xmm0, %xmm0
; X32_AVX-NEXT: vminss LCPI1_2, %xmm0, %xmm0		; X32_AVX-NEXT: vminss LCPI1_2, %xmm0, %xmm0
; X32_AVX-NEXT: vxorps %xmm1, %xmm1, %xmm1		; X32_AVX-NEXT: vxorps %xmm1, %xmm1, %xmm1
; X32_AVX-NEXT: vmaxss %xmm1, %xmm0, %xmm0		; X32_AVX-NEXT: vmaxss %xmm1, %xmm0, %xmm0
; X32_AVX-NEXT: vcvttss2si %xmm0, %eax		; X32_AVX-NEXT: vcvttss2si %xmm0, %eax
; X32_AVX-NEXT: ## kill: def $ax killed $ax killed $eax		; X32_AVX-NEXT: ## kill: def $ax killed $ax killed $eax
; X32_AVX-NEXT: retl		; X32_AVX-NEXT: retl
;		;
; X64_AVX-LABEL: test2:		; X64_AVX-LABEL: test2:
; X64_AVX: ## %bb.0:		; X64_AVX: ## %bb.0:
; X64_AVX-NEXT: vaddss {{.*}}(%rip), %xmm0, %xmm0		; X64_AVX-NEXT: vsubss {{.*}}(%rip), %xmm0, %xmm0
; X64_AVX-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0		; X64_AVX-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0
; X64_AVX-NEXT: vminss {{.*}}(%rip), %xmm0, %xmm0		; X64_AVX-NEXT: vminss {{.*}}(%rip), %xmm0, %xmm0
; X64_AVX-NEXT: vxorps %xmm1, %xmm1, %xmm1		; X64_AVX-NEXT: vxorps %xmm1, %xmm1, %xmm1
; X64_AVX-NEXT: vmaxss %xmm1, %xmm0, %xmm0		; X64_AVX-NEXT: vmaxss %xmm1, %xmm0, %xmm0
; X64_AVX-NEXT: vcvttss2si %xmm0, %eax		; X64_AVX-NEXT: vcvttss2si %xmm0, %eax
; X64_AVX-NEXT: ## kill: def $ax killed $ax killed $eax		; X64_AVX-NEXT: ## kill: def $ax killed $ax killed $eax
; X64_AVX-NEXT: retq		; X64_AVX-NEXT: retq
%tmp28 = fsub float %f, 1.000000e+00 ; <float> [#uses=1]		%tmp28 = fsub float %f, 1.000000e+00 ; <float> [#uses=1]
▲ Show 20 Lines • Show All 270 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Perform the fold of A - (-B) -> A + B only when it is cheaper AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 246636

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/AMDGPU/fma-combine.ll

llvm/test/CodeGen/AMDGPU/fmuladd.f16.ll

llvm/test/CodeGen/AMDGPU/fmuladd.f32.ll

llvm/test/CodeGen/AMDGPU/fsub.f16.ll

llvm/test/CodeGen/AMDGPU/reduction.ll

llvm/test/CodeGen/AMDGPU/v_mac.ll

llvm/test/CodeGen/PowerPC/qpx-recipest.ll

llvm/test/CodeGen/X86/dag-fmf-cse.ll

llvm/test/CodeGen/X86/fma_patterns.ll

llvm/test/CodeGen/X86/fma_patterns_wide.ll

llvm/test/CodeGen/X86/fp-fold.ll

llvm/test/CodeGen/X86/fp_constant_op.ll

llvm/test/CodeGen/X86/limited-prec.ll

llvm/test/CodeGen/X86/load-scalar-as-vector.ll

llvm/test/CodeGen/X86/negative-sin.ll

llvm/test/CodeGen/X86/pr44749.ll

llvm/test/CodeGen/X86/vec_ss_load_fold.ll

[DAGCombine] Perform the fold of A - (-B) -> A + B only when it is cheaper
AbandonedPublic