This is an archive of the discontinued LLVM Phabricator instance.

[X86] Don't fold (fneg (fma (fneg X), Y, (fneg Z))) to (fma X, Y, Z)
ClosedPublic

Authored by Jim on Nov 5 2020, 10:30 PM.

Download Raw Diff

Details

Reviewers

steven.zhang
RKSimon
spatel
qiucf

Commits

rG445680593889: [X86] Don't fold (fneg (fma (fneg X), Y, (fneg Z))) to (fma X, Y, Z)

Summary

Check if it has no signed zeros flag (nsz) in getNegatedExpression for x86.
This patch fixed miscompilation: https://alive2.llvm.org/ce/z/XxwBAJ

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Jim created this revision.Nov 5 2020, 10:30 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 5 2020, 10:30 PM

Herald added subscribers: llvm-commits, ecnelises, hiraditya. · View Herald Transcript

Jim requested review of this revision.Nov 5 2020, 10:30 PM

No test changes?

The transform would be legal with no signed zeros fast math flag right? I believe that's checked in getNegatedExpression. But not checked in the X86 specific override.

I think the same issue may also exist in ARMInstrVFP.td. It matches both (fneg (fma x, y, z)) and (fma (fneg x), y, (fneg z)) to the same instruction.

// Match @llvm.fma.* intrinsics                                                                                                                                                                
// (fneg (fma x, y, z)) -> (vfnma z, x, y)                                                                                                                                                     
def : Pat<(fneg (fma (f64 DPR:$Dn), (f64 DPR:$Dm), (f64 DPR:$Ddin))),                                                                                                                          
          (VFNMAD DPR:$Ddin, DPR:$Dn, DPR:$Dm)>,                                                                                                                                               
      Requires<[HasVFP4,HasDPVFP]>;                                                                                                                                                            
def : Pat<(fneg (fma (f32 SPR:$Sn), (f32 SPR:$Sm), (f32 SPR:$Sdin))),                                                                                                                          
          (VFNMAS SPR:$Sdin, SPR:$Sn, SPR:$Sm)>,                                                                                                                                               
      Requires<[HasVFP4]>;                                                                                                                                                                     
def : Pat<(fneg (fma (f16 HPR:$Sn), (f16 HPR:$Sm), (f16 (f16 HPR:$Sdin)))),                                                                                                                    
          (VFNMAH (f16 HPR:$Sdin), (f16 HPR:$Sn), (f16 HPR:$Sm))>,                                                                                                                             
      Requires<[HasFullFP16]>;                                                                                                                                                                 
// (fma (fneg x), y, (fneg z)) -> (vfnma z, x, y)                                                                                                                                              
def : Pat<(f64 (fma (fneg DPR:$Dn), DPR:$Dm, (fneg DPR:$Ddin))),                                                                                                                               
          (VFNMAD DPR:$Ddin, DPR:$Dn, DPR:$Dm)>,                                                                                                                                               
      Requires<[HasVFP4,HasDPVFP]>;                                                                                                                                                            
def : Pat<(f32 (fma (fneg SPR:$Sn), SPR:$Sm, (fneg SPR:$Sdin))),                                                                                                                               
          (VFNMAS SPR:$Sdin, SPR:$Sn, SPR:$Sm)>,                                                                                                                                               
      Requires<[HasVFP4]>;                                                                                                                                                                     
def : Pat<(f16 (fma (fneg (f16 HPR:$Sn)), (f16 HPR:$Sm), (fneg (f16 HPR:$Sdin)))),                                                                                                             
          (VFNMAH (f16 HPR:$Sdin), (f16 HPR:$Sn), (f16 HPR:$Sm))>,                                                                                                                             
      Requires<[HasFullFP16]>;

craig.topper added a reviewer: spatel.Nov 5 2020, 10:45 PM

Harbormaster completed remote builds in B77821: Diff 303331.Nov 5 2020, 11:25 PM

Update testcase.

Herald added a subscriber: nemanjai. · View Herald TranscriptNov 6 2020, 1:21 AM

lkail added a subscriber: shchenz.Nov 6 2020, 1:24 AM

In D90901#2378066, @craig.topper wrote:

No test changes?

The transform would be legal with no signed zeros fast math flag right? I believe that's checked in getNegatedExpression. But not checked in the X86 specific override.

I think the same issue may also exist in ARMInstrVFP.td. It matches both (fneg (fma x, y, z)) and (fma (fneg x), y, (fneg z)) to the same instruction.

// Match @llvm.fma.* intrinsics                                                                                                                                                                
// (fneg (fma x, y, z)) -> (vfnma z, x, y)                                                                                                                                                     
def : Pat<(fneg (fma (f64 DPR:$Dn), (f64 DPR:$Dm), (f64 DPR:$Ddin))),                                                                                                                          
          (VFNMAD DPR:$Ddin, DPR:$Dn, DPR:$Dm)>,                                                                                                                                               
      Requires<[HasVFP4,HasDPVFP]>;                                                                                                                                                            
def : Pat<(fneg (fma (f32 SPR:$Sn), (f32 SPR:$Sm), (f32 SPR:$Sdin))),                                                                                                                          
          (VFNMAS SPR:$Sdin, SPR:$Sn, SPR:$Sm)>,                                                                                                                                               
      Requires<[HasVFP4]>;                                                                                                                                                                     
def : Pat<(fneg (fma (f16 HPR:$Sn), (f16 HPR:$Sm), (f16 (f16 HPR:$Sdin)))),                                                                                                                    
          (VFNMAH (f16 HPR:$Sdin), (f16 HPR:$Sn), (f16 HPR:$Sm))>,                                                                                                                             
      Requires<[HasFullFP16]>;                                                                                                                                                                 
// (fma (fneg x), y, (fneg z)) -> (vfnma z, x, y)                                                                                                                                              
def : Pat<(f64 (fma (fneg DPR:$Dn), DPR:$Dm, (fneg DPR:$Ddin))),                                                                                                                               
          (VFNMAD DPR:$Ddin, DPR:$Dn, DPR:$Dm)>,                                                                                                                                               
      Requires<[HasVFP4,HasDPVFP]>;                                                                                                                                                            
def : Pat<(f32 (fma (fneg SPR:$Sn), SPR:$Sm, (fneg SPR:$Sdin))),                                                                                                                               
          (VFNMAS SPR:$Sdin, SPR:$Sn, SPR:$Sm)>,                                                                                                                                               
      Requires<[HasVFP4]>;                                                                                                                                                                     
def : Pat<(f16 (fma (fneg (f16 HPR:$Sn)), (f16 HPR:$Sm), (fneg (f16 HPR:$Sdin)))),                                                                                                             
          (VFNMAH (f16 HPR:$Sdin), (f16 HPR:$Sn), (f16 HPR:$Sm))>,                                                                                                                             
      Requires<[HasFullFP16]>;

Yes, it is legal with no signed zeros fast math flag. I see that PowerPC deal with this case specifically.

Should I add the condition with no signed zero to permit this transform instead of deleting it?

steven.zhang added a reviewer: qiucf.Nov 6 2020, 2:34 AM

In D90901#2378338, @Jim wrote:

In D90901#2378066, @craig.topper wrote:

No test changes?

The transform would be legal with no signed zeros fast math flag right? I believe that's checked in getNegatedExpression. But not checked in the X86 specific override.

I think the same issue may also exist in ARMInstrVFP.td. It matches both (fneg (fma x, y, z)) and (fma (fneg x), y, (fneg z)) to the same instruction.

// Match @llvm.fma.* intrinsics                                                                                                                                                                
// (fneg (fma x, y, z)) -> (vfnma z, x, y)                                                                                                                                                     
def : Pat<(fneg (fma (f64 DPR:$Dn), (f64 DPR:$Dm), (f64 DPR:$Ddin))),                                                                                                                          
          (VFNMAD DPR:$Ddin, DPR:$Dn, DPR:$Dm)>,                                                                                                                                               
      Requires<[HasVFP4,HasDPVFP]>;                                                                                                                                                            
def : Pat<(fneg (fma (f32 SPR:$Sn), (f32 SPR:$Sm), (f32 SPR:$Sdin))),                                                                                                                          
          (VFNMAS SPR:$Sdin, SPR:$Sn, SPR:$Sm)>,                                                                                                                                               
      Requires<[HasVFP4]>;                                                                                                                                                                     
def : Pat<(fneg (fma (f16 HPR:$Sn), (f16 HPR:$Sm), (f16 (f16 HPR:$Sdin)))),                                                                                                                    
          (VFNMAH (f16 HPR:$Sdin), (f16 HPR:$Sn), (f16 HPR:$Sm))>,                                                                                                                             
      Requires<[HasFullFP16]>;                                                                                                                                                                 
// (fma (fneg x), y, (fneg z)) -> (vfnma z, x, y)                                                                                                                                              
def : Pat<(f64 (fma (fneg DPR:$Dn), DPR:$Dm, (fneg DPR:$Ddin))),                                                                                                                               
          (VFNMAD DPR:$Ddin, DPR:$Dn, DPR:$Dm)>,                                                                                                                                               
      Requires<[HasVFP4,HasDPVFP]>;                                                                                                                                                            
def : Pat<(f32 (fma (fneg SPR:$Sn), SPR:$Sm, (fneg SPR:$Sdin))),                                                                                                                               
          (VFNMAS SPR:$Sdin, SPR:$Sn, SPR:$Sm)>,                                                                                                                                               
      Requires<[HasVFP4]>;                                                                                                                                                                     
def : Pat<(f16 (fma (fneg (f16 HPR:$Sn)), (f16 HPR:$Sm), (fneg (f16 HPR:$Sdin)))),                                                                                                             
          (VFNMAH (f16 HPR:$Sdin), (f16 HPR:$Sn), (f16 HPR:$Sm))>,                                                                                                                             
      Requires<[HasFullFP16]>;

Yes, it is legal with no signed zeros fast math flag. I see that PowerPC deal with this case specifically.

Should I add the condition with no signed zero to permit this transform instead of deleting it?

I think so. As Craig pointed out, the default implementation of getNegatedExpression will take care of the fast-math flags. You need check the nsz inside X86::getNegatedExpression() when perform some folding that might change the sign bit of zero.

Also, target specific test is missing. The powerpc part test change is unexpected as there is already nsz flags there.

Harbormaster completed remote builds in B77839: Diff 303364.Nov 6 2020, 3:00 AM

In D90901#2378427, @steven.zhang wrote:

Also, target specific test is missing. The powerpc part test change is unexpected as there is already nsz flags there.

+1. The bug(s) appear to be in target-specific code, so we need a minimal test for x86 (and ARM and possibly others) to go with the code fix. IIUC, something like this:

define double @fneg_fma(double %x, double %y, double %z) {
  %negx = fneg double %x
  %negz = fneg double %z
  %fma = call double @llvm.fma.f64(double %negx, double %y, double %negz)
  %n = fneg double %fma
  ret double %n
}

Currently, that is transformed to vfmadd213sd, but that's a miscompile. We can simulate that in IR with Alive2:
https://alive2.llvm.org/ce/z/XxwBAJ

reverse ping

As @spatel said, we need more per-target test coverage for this

This revision now requires changes to proceed.May 4 2021, 7:13 AM

Check if it has no signed zeros flag (nsz) or no NoSignedZerosFPMath option
in getNegatedExpression for x86.

Herald added a subscriber: pengfei. · View Herald TranscriptMay 16 2021, 12:59 AM

Jim retitled this revision from [DAGCombiner] Don't fold ((fma (fneg X), Y, (fneg Z)) to fneg (fma X, Y, Z)) to [X86] Don't fold (fneg (fma (fneg X), Y, (fneg Z))) to (fma X, Y, Z).May 16 2021, 1:00 AM

Jim edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B104688: Diff 345677.May 16 2021, 1:31 AM

qiucf added inline comments.May 17 2021, 2:47 AM

llvm/test/CodeGen/X86/fma-signed-zero.ll
12	You can pre-commit (or at least stage them first and do git-diff) this test so that what optimizations are prevented is clear.

Address @qiucf's comment.

Pre-commit patch https://reviews.llvm.org/D102621.

Harbormaster completed remote builds in B104810: Diff 345856.May 17 2021, 6:27 AM

spatel mentioned this in rG8854b27b198c: [x86] update fma test with deprecated intrinsics; NFC.May 17 2021, 8:07 AM

spatel added inline comments.May 17 2021, 8:16 AM

llvm/test/CodeGen/X86/avx2-fma-fneg-combine.ll
3	To confirm: this patch is based on a local update with the test changes? We really want to avoid adding llc options to toggle the FP state as discussed in D99080 (so this diff would invert what we want to do there). Unfortunately, this is a yak shaving exercise because the tests are using deprecated target-specific intrinsics. I hopefully fixed that for this file at least here: 8854b27b19 Ideally, we can now add tests with FMF (nsz on the appropriate instructions) to preserve the intent of these tests and also demonstrate the bug. The fma-fneg-combine.ll file has the same problem. Let me know if I should update that or if you want to give that a try.

Jim added a parent revision: D102621: [X86] Pre-commit test for D90901.May 17 2021, 7:32 PM

Jim added inline comments.May 17 2021, 10:30 PM

llvm/test/CodeGen/X86/avx2-fma-fneg-combine.ll
3	I don't understand what do you mean "this patch is based on a local update with the test changes?" I am not sure how to update the tests in fma-fneg-combine.ll. The intrinsics used in fma-fneg-combine.ll have additional arguments not just only 3 arguments. I would update avx2-fma-fneg-combine.ll by adding FMF on the intrunctions.

spatel mentioned this in D102725: [SDAG] propagate FMF from target-specific IR intrinsics.May 18 2021, 2:05 PM

spatel added inline comments.May 18 2021, 2:10 PM

llvm/test/CodeGen/X86/avx2-fma-fneg-combine.ll
3	Sorry - I didn't realize the tests in fma-fneg-combine.ll were for still active target-specific nodes. I posted D102725, so we can try to convert those to use IR-level FMF too. I think you can update this patch to use IR-level FMF on avx2-fma-fneg-combine.ll now. It would be good to add tests with 'nsz' and leave the existing tests as-is. That way we'll show that we are not miscompiling but we are optimizing if possible.

Jim added inline comments.May 18 2021, 8:50 PM

llvm/test/CodeGen/X86/avx2-fma-fneg-combine.ll
3	I found that llvm.x86.avx512.vfmadd.ps.512 in fma-fneg-combine.ll would be lowered to general fma but it losts FMF. Do you have any comment to fix it ?

spatel mentioned this in rG6025663578cd: [SDAG] propagate FMF from target-specific IR intrinsics.May 19 2021, 4:52 AM

spatel mentioned this in rGf66ba4cfa7ca: [x86] propagate FMF from x86-specific intrinsic nodes to others during lowering.May 19 2021, 10:15 AM

spatel mentioned this in rG333c968d4003: [x86] update fma test with deprecated intrinsics; NFC.May 19 2021, 11:05 AM

spatel mentioned this in rGf12f9beb0428: [x86] propagate FMF from x86-specific intrinsic nodes to others during combining.May 19 2021, 11:25 AM

spatel mentioned this in rG9b59a61cfc4e: [x86] add tests for fma folds with fast-math-flags; NFC.May 19 2021, 11:32 AM

spatel added inline comments.May 19 2021, 11:40 AM

llvm/test/CodeGen/X86/avx2-fma-fneg-combine.ll
3	Yes, it's a mess. Please have a look and update this patch after: f66ba4cfa 333c968 f12f9be 9b59a61 If I got it right, we have the necessary test coverage in fma-fneg-combine.ll now, so there is no need to change the RUN lines in that file. Just regenerate the CHECK lines using utils/update_llc_test_checks.py after applying this patch.

Rebase and update testcases.

Harbormaster completed remote builds in B105345: Diff 346618.May 19 2021, 7:40 PM

spatel added inline comments.May 20 2021, 5:38 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
47066	Do not check the TargetOptions here; that's the legacy/deprecated construct. We should have fixed FMF propagation enough at this point, so it should not be necessary. And we should have enough test coverage to verify that (although it's hard to tell what is redundant/missing in the existing tests). That also means we should not use target options or function attrs in the new test file. I'll comment directly on D102621 to make that clearer.

Address @spatel's comment

Harbormaster completed remote builds in B105553: Diff 346913.May 20 2021, 8:34 PM

spatel added inline comments.May 21 2021, 5:14 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
47049	Don't need to check TargetOptions?

Address @spatel's comment.

LGTM - thanks for working through the FMF/test updates!
Please update the commit message since we are not checking the target options now.
Also, you may want to add a link to an Alive2 proof and mention that this patch fixes miscompiles:
https://alive2.llvm.org/ce/z/XxwBAJ

Harbormaster completed remote builds in B105621: Diff 347012.May 21 2021, 7:06 AM

LGTM (just to unblock) - @spatel has the final decision - cheers

This revision is now accepted and ready to land.May 21 2021, 7:16 AM

Thanks! @spatel @RKSimon

Closed by commit rG445680593889: [X86] Don't fold (fneg (fma (fneg X), Y, (fneg Z))) to (fma X, Y, Z) (authored by Jim). · Explain WhyMay 21 2021, 7:59 AM

This revision was automatically updated to reflect the committed changes.

Jim mentioned this in rG35e5c3310fb0: [X86] Pre-commit test for D90901.

Jim added a commit: rG445680593889: [X86] Don't fold (fneg (fma (fneg X), Y, (fneg Z))) to (fma X, Y, Z).

craig.topper mentioned this in D109523: [ARM] Remove isel patterns that start with (fneg (fma))..Sep 9 2021, 9:30 AM

craig.topper mentioned this in D109525: [SVE] Only combine (fneg (fma)) => FNMLA with nsz.Sep 9 2021, 9:43 AM

peterwaller-arm mentioned this in rG921e89c59a71: [SVE] Only combine (fneg (fma)) => FNMLA with nsz.Dec 13 2021, 3:35 AM

craig.topper mentioned this in D126852: [RISCV] Add more patterns for FNMADD.Jun 1 2022, 9:00 PM

GitHub <noreply@github.com> mentioned this in rG5e7e0d603204: [LoongArch] Fix pattern for FNMSUB_{S/D} instructions (#73742).Tue, Nov 28, 11:21 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86ISelLowering.cpp

6 lines

test/

CodeGen/

X86/

avx2-fma-fneg-combine.ll

16 lines

91 lines

6 lines

76 lines

44 lines

Diff 347033

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 47,039 Lines • ▼ Show 20 Lines	SDValue X86TargetLowering::getNegatedExpression(SDValue Op, SelectionDAG &DAG,
if (SDValue Arg = isFNEG(DAG, Op.getNode(), Depth)) {		if (SDValue Arg = isFNEG(DAG, Op.getNode(), Depth)) {
Cost = NegatibleCost::Cheaper;		Cost = NegatibleCost::Cheaper;
return DAG.getBitcast(Op.getValueType(), Arg);		return DAG.getBitcast(Op.getValueType(), Arg);
}		}

EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
EVT SVT = VT.getScalarType();		EVT SVT = VT.getScalarType();
unsigned Opc = Op.getOpcode();		unsigned Opc = Op.getOpcode();
		SDNodeFlags Flags = Op.getNode()->getFlags();
switch (Opc) {		switch (Opc) {
		spatelUnsubmitted Not Done Reply Inline Actions Don't need to check TargetOptions? spatel: Don't need to check TargetOptions?
case ISD::FMA:		case ISD::FMA:
case X86ISD::FMSUB:		case X86ISD::FMSUB:
case X86ISD::FNMADD:		case X86ISD::FNMADD:
case X86ISD::FNMSUB:		case X86ISD::FNMSUB:
case X86ISD::FMADD_RND:		case X86ISD::FMADD_RND:
case X86ISD::FMSUB_RND:		case X86ISD::FMSUB_RND:
case X86ISD::FNMADD_RND:		case X86ISD::FNMADD_RND:
case X86ISD::FNMSUB_RND: {		case X86ISD::FNMSUB_RND: {
if (!Op.hasOneUse() \|\| !Subtarget.hasAnyFMA() \|\| !isTypeLegal(VT) \|\|		if (!Op.hasOneUse() \|\| !Subtarget.hasAnyFMA() \|\| !isTypeLegal(VT) \|\|
!(SVT == MVT::f32 \|\| SVT == MVT::f64) \|\|		!(SVT == MVT::f32 \|\| SVT == MVT::f64) \|\|
!isOperationLegal(ISD::FMA, VT))		!isOperationLegal(ISD::FMA, VT))
break;		break;

		// Don't fold (fneg (fma (fneg x), y, (fneg z))) to (fma x, y, z)
		// if it may have signed zeros.
		if (!Flags.hasNoSignedZeros())
		break;
		spatelUnsubmitted Not Done Reply Inline Actions Do not check the TargetOptions here; that's the legacy/deprecated construct. We should have fixed FMF propagation enough at this point, so it should not be necessary. And we should have enough test coverage to verify that (although it's hard to tell what is redundant/missing in the existing tests). That also means we should not use target options or function attrs in the new test file. I'll comment directly on D102621 to make that clearer. spatel: Do not check the TargetOptions here; that's the legacy/deprecated construct. We should have…

// This is always negatible for free but we might be able to remove some		// This is always negatible for free but we might be able to remove some
// extra operand negations as well.		// extra operand negations as well.
SmallVector<SDValue, 4> NewOps(Op.getNumOperands(), SDValue());		SmallVector<SDValue, 4> NewOps(Op.getNumOperands(), SDValue());
for (int i = 0; i != 3; ++i)		for (int i = 0; i != 3; ++i)
NewOps[i] = getCheaperNegatedExpression(		NewOps[i] = getCheaperNegatedExpression(
Op.getOperand(i), DAG, LegalOperations, ForCodeSize, Depth + 1);		Op.getOperand(i), DAG, LegalOperations, ForCodeSize, Depth + 1);

bool NegA = !!NewOps[0];		bool NegA = !!NewOps[0];
▲ Show 20 Lines • Show All 5,138 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx2-fma-fneg-combine.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx2,+fma \| FileCheck %s --check-prefix=X32			; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx2,+fma \| FileCheck %s --check-prefix=X32
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2,+fma \| FileCheck %s --check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2,+fma \| FileCheck %s --check-prefix=X64
				spatelUnsubmitted Not Done Reply Inline Actions To confirm: this patch is based on a local update with the test changes? We really want to avoid adding llc options to toggle the FP state as discussed in D99080 (so this diff would invert what we want to do there). Unfortunately, this is a yak shaving exercise because the tests are using deprecated target-specific intrinsics. I hopefully fixed that for this file at least here: 8854b27b19 Ideally, we can now add tests with FMF (nsz on the appropriate instructions) to preserve the intent of these tests and also demonstrate the bug. The fma-fneg-combine.ll file has the same problem. Let me know if I should update that or if you want to give that a try. spatel: To confirm: this patch is based on a local update with the test changes? We really want to…
				JimAuthorUnsubmitted Done Reply Inline Actions I don't understand what do you mean "this patch is based on a local update with the test changes?" I am not sure how to update the tests in fma-fneg-combine.ll. The intrinsics used in fma-fneg-combine.ll have additional arguments not just only 3 arguments. I would update avx2-fma-fneg-combine.ll by adding FMF on the intrunctions. Jim: I don't understand what do you mean "this patch is based on a local update with the test…
				spatelUnsubmitted Not Done Reply Inline Actions Sorry - I didn't realize the tests in fma-fneg-combine.ll were for still active target-specific nodes. I posted D102725, so we can try to convert those to use IR-level FMF too. I think you can update this patch to use IR-level FMF on avx2-fma-fneg-combine.ll now. It would be good to add tests with 'nsz' and leave the existing tests as-is. That way we'll show that we are not miscompiling but we are optimizing if possible. spatel: Sorry - I didn't realize the tests in fma-fneg-combine.ll were for still active target-specific…
				JimAuthorUnsubmitted Done Reply Inline Actions I found that llvm.x86.avx512.vfmadd.ps.512 in fma-fneg-combine.ll would be lowered to general fma but it losts FMF. Do you have any comment to fix it ? Jim: I found that llvm.x86.avx512.vfmadd.ps.512 in fma-fneg-combine.ll would be lowered to general…
				spatelUnsubmitted Not Done Reply Inline Actions Yes, it's a mess. Please have a look and update this patch after: f66ba4cfa 333c968 f12f9be 9b59a61 If I got it right, we have the necessary test coverage in fma-fneg-combine.ll now, so there is no need to change the RUN lines in that file. Just regenerate the CHECK lines using utils/update_llc_test_checks.py after applying this patch. spatel: Yes, it's a mess. Please have a look and update this patch after: f66ba4cfa 333c968 f12f9be…

	declare <8 x float> @llvm.fma.v8f32(<8 x float>, <8 x float>, <8 x float>)			declare <8 x float> @llvm.fma.v8f32(<8 x float>, <8 x float>, <8 x float>)
	declare <4 x float> @llvm.fma.v4f32(<4 x float>, <4 x float>, <4 x float>)			declare <4 x float> @llvm.fma.v4f32(<4 x float>, <4 x float>, <4 x float>)
	declare float @llvm.fma.f32(float, float, float)			declare float @llvm.fma.f32(float, float, float)
	declare <2 x double> @llvm.fma.v2f64(<2 x double>, <2 x double>, <2 x double>)			declare <2 x double> @llvm.fma.v2f64(<2 x double>, <2 x double>, <2 x double>)

	; This test checks combinations of FNEG and FMA intrinsics			; This test checks combinations of FNEG and FMA intrinsics

	define <8 x float> @test1(<8 x float> %a, <8 x float> %b, <8 x float> %c) {			define <8 x float> @test1(<8 x float> %a, <8 x float> %b, <8 x float> %c) {
	; X32-LABEL: test1:			; X32-LABEL: test1:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: vfmsub213ps {{.#+}} ymm0 = (ymm1 ymm0) - ymm2			; X32-NEXT: vfmsub213ps {{.#+}} ymm0 = (ymm1 ymm0) - ymm2
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test1:			; X64-LABEL: test1:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: vfmsub213ps {{.#+}} ymm0 = (ymm1 ymm0) - ymm2			; X64-NEXT: vfmsub213ps {{.#+}} ymm0 = (ymm1 ymm0) - ymm2
	; X64-NEXT: retq			; X64-NEXT: retq
	%sub.i = fsub <8 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %c			%sub.i = fsub <8 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %c
	%r = tail call <8 x float> @llvm.fma.v8f32(<8 x float> %a, <8 x float> %b, <8 x float> %sub.i) #2			%r = tail call nsz <8 x float> @llvm.fma.v8f32(<8 x float> %a, <8 x float> %b, <8 x float> %sub.i) #2
	ret <8 x float> %r			ret <8 x float> %r
	}			}

	define <4 x float> @test2(<4 x float> %a, <4 x float> %b, <4 x float> %c) {			define <4 x float> @test2(<4 x float> %a, <4 x float> %b, <4 x float> %c) {
	; X32-LABEL: test2:			; X32-LABEL: test2:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: vfnmsub213ps {{.#+}} xmm0 = -(xmm1 xmm0) - xmm2			; X32-NEXT: vfnmsub213ps {{.#+}} xmm0 = -(xmm1 xmm0) - xmm2
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test2:			; X64-LABEL: test2:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: vfnmsub213ps {{.#+}} xmm0 = -(xmm1 xmm0) - xmm2			; X64-NEXT: vfnmsub213ps {{.#+}} xmm0 = -(xmm1 xmm0) - xmm2
	; X64-NEXT: retq			; X64-NEXT: retq
	%t0 = tail call <4 x float> @llvm.fma.v4f32(<4 x float> %a, <4 x float> %b, <4 x float> %c) #2			%t0 = tail call nsz <4 x float> @llvm.fma.v4f32(<4 x float> %a, <4 x float> %b, <4 x float> %c) #2
	%sub.i = fsub <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %t0			%sub.i = fsub <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %t0
	ret <4 x float> %sub.i			ret <4 x float> %sub.i
	}			}

	define <4 x float> @test3(<4 x float> %a, <4 x float> %b, <4 x float> %c) {			define <4 x float> @test3(<4 x float> %a, <4 x float> %b, <4 x float> %c) {
	; X32-LABEL: test3:			; X32-LABEL: test3:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: vfnmadd213ss {{.#+}} xmm0 = -(xmm1 xmm0) + xmm2			; X32-NEXT: vfnmadd213ss {{.#+}} xmm0 = -(xmm1 xmm0) + xmm2
	; X32-NEXT: vbroadcastss {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]			; X32-NEXT: vbroadcastss {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
	; X32-NEXT: vxorps %xmm1, %xmm0, %xmm0			; X32-NEXT: vxorps %xmm1, %xmm0, %xmm0
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test3:			; X64-LABEL: test3:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: vfnmadd213ss {{.#+}} xmm0 = -(xmm1 xmm0) + xmm2			; X64-NEXT: vfnmadd213ss {{.#+}} xmm0 = -(xmm1 xmm0) + xmm2
	; X64-NEXT: vbroadcastss {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]			; X64-NEXT: vbroadcastss {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
	; X64-NEXT: vxorps %xmm1, %xmm0, %xmm0			; X64-NEXT: vxorps %xmm1, %xmm0, %xmm0
	; X64-NEXT: retq			; X64-NEXT: retq
	%a0 = extractelement <4 x float> %a, i64 0			%a0 = extractelement <4 x float> %a, i64 0
	%b0 = extractelement <4 x float> %b, i64 0			%b0 = extractelement <4 x float> %b, i64 0
	%c0 = extractelement <4 x float> %c, i64 0			%c0 = extractelement <4 x float> %c, i64 0
	%negb0 = fneg float %b0			%negb0 = fneg float %b0
	%t0 = tail call float @llvm.fma.f32(float %a0, float %negb0, float %c0) #2			%t0 = tail call nsz float @llvm.fma.f32(float %a0, float %negb0, float %c0) #2
	%i = insertelement <4 x float> %a, float %t0, i64 0			%i = insertelement <4 x float> %a, float %t0, i64 0
	%sub.i = fsub <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %i			%sub.i = fsub <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %i
	ret <4 x float> %sub.i			ret <4 x float> %sub.i
	}			}

	define <8 x float> @test4(<8 x float> %a, <8 x float> %b, <8 x float> %c) {			define <8 x float> @test4(<8 x float> %a, <8 x float> %b, <8 x float> %c) {
	; X32-LABEL: test4:			; X32-LABEL: test4:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: vfnmadd213ps {{.#+}} ymm0 = -(ymm1 ymm0) + ymm2			; X32-NEXT: vfnmadd213ps {{.#+}} ymm0 = -(ymm1 ymm0) + ymm2
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test4:			; X64-LABEL: test4:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: vfnmadd213ps {{.#+}} ymm0 = -(ymm1 ymm0) + ymm2			; X64-NEXT: vfnmadd213ps {{.#+}} ymm0 = -(ymm1 ymm0) + ymm2
	; X64-NEXT: retq			; X64-NEXT: retq
	%negc = fneg <8 x float> %c			%negc = fneg <8 x float> %c
	%t0 = tail call <8 x float> @llvm.fma.v8f32(<8 x float> %a, <8 x float> %b, <8 x float> %negc) #2			%t0 = tail call nsz <8 x float> @llvm.fma.v8f32(<8 x float> %a, <8 x float> %b, <8 x float> %negc) #2
	%sub.i = fsub <8 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %t0			%sub.i = fsub <8 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %t0
	ret <8 x float> %sub.i			ret <8 x float> %sub.i
	}			}

	define <8 x float> @test5(<8 x float> %a, <8 x float> %b, <8 x float> %c) {			define <8 x float> @test5(<8 x float> %a, <8 x float> %b, <8 x float> %c) {
	; X32-LABEL: test5:			; X32-LABEL: test5:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: vfmadd213ps {{.#+}} ymm0 = (ymm1 ymm0) + ymm2			; X32-NEXT: vfmadd213ps {{.#+}} ymm0 = (ymm1 ymm0) + ymm2
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test5:			; X64-LABEL: test5:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: vfmadd213ps {{.#+}} ymm0 = (ymm1 ymm0) + ymm2			; X64-NEXT: vfmadd213ps {{.#+}} ymm0 = (ymm1 ymm0) + ymm2
	; X64-NEXT: retq			; X64-NEXT: retq
	%sub.c = fsub <8 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %c			%sub.c = fsub <8 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %c
	%negsubc = fneg <8 x float> %sub.c			%negsubc = fneg <8 x float> %sub.c
	%t0 = tail call <8 x float> @llvm.fma.v8f32(<8 x float> %a, <8 x float> %b, <8 x float> %negsubc) #2			%t0 = tail call nsz <8 x float> @llvm.fma.v8f32(<8 x float> %a, <8 x float> %b, <8 x float> %negsubc) #2
	ret <8 x float> %t0			ret <8 x float> %t0
	}			}

	define <2 x double> @test6(<2 x double> %a, <2 x double> %b, <2 x double> %c) {			define <2 x double> @test6(<2 x double> %a, <2 x double> %b, <2 x double> %c) {
	; X32-LABEL: test6:			; X32-LABEL: test6:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: vfnmsub213pd {{.#+}} xmm0 = -(xmm1 xmm0) - xmm2			; X32-NEXT: vfnmsub213pd {{.#+}} xmm0 = -(xmm1 xmm0) - xmm2
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test6:			; X64-LABEL: test6:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: vfnmsub213pd {{.#+}} xmm0 = -(xmm1 xmm0) - xmm2			; X64-NEXT: vfnmsub213pd {{.#+}} xmm0 = -(xmm1 xmm0) - xmm2
	; X64-NEXT: retq			; X64-NEXT: retq
	%t0 = tail call <2 x double> @llvm.fma.v2f64(<2 x double> %a, <2 x double> %b, <2 x double> %c) #2			%t0 = tail call nsz <2 x double> @llvm.fma.v2f64(<2 x double> %a, <2 x double> %b, <2 x double> %c) #2
	%sub.i = fsub <2 x double> <double -0.0, double -0.0>, %t0			%sub.i = fsub <2 x double> <double -0.0, double -0.0>, %t0
	ret <2 x double> %sub.i			ret <2 x double> %sub.i
	}			}

	define <8 x float> @test7(float %a, <8 x float> %b, <8 x float> %c) {			define <8 x float> @test7(float %a, <8 x float> %b, <8 x float> %c) {
	; X32-LABEL: test7:			; X32-LABEL: test7:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: vbroadcastss {{[0-9]+}}(%esp), %ymm2			; X32-NEXT: vbroadcastss {{[0-9]+}}(%esp), %ymm2
	; X32-NEXT: vfnmadd213ps {{.#+}} ymm0 = -(ymm2 ymm0) + ymm1			; X32-NEXT: vfnmadd213ps {{.#+}} ymm0 = -(ymm2 ymm0) + ymm1
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test7:			; X64-LABEL: test7:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: vbroadcastss %xmm0, %ymm0			; X64-NEXT: vbroadcastss %xmm0, %ymm0
	; X64-NEXT: vfnmadd213ps {{.#+}} ymm0 = -(ymm1 ymm0) + ymm2			; X64-NEXT: vfnmadd213ps {{.#+}} ymm0 = -(ymm1 ymm0) + ymm2
	; X64-NEXT: retq			; X64-NEXT: retq
	%t0 = insertelement <8 x float> undef, float %a, i32 0			%t0 = insertelement <8 x float> undef, float %a, i32 0
	%t1 = fsub <8 x float> <float -0.0, float undef, float undef, float undef, float undef, float undef, float undef, float undef>, %t0			%t1 = fsub <8 x float> <float -0.0, float undef, float undef, float undef, float undef, float undef, float undef, float undef>, %t0
	%t2 = shufflevector <8 x float> %t1, <8 x float> undef, <8 x i32> zeroinitializer			%t2 = shufflevector <8 x float> %t1, <8 x float> undef, <8 x i32> zeroinitializer
	%t3 = tail call <8 x float> @llvm.fma.v8f32(<8 x float> %t2, <8 x float> %b, <8 x float> %c)			%t3 = tail call nsz <8 x float> @llvm.fma.v8f32(<8 x float> %t2, <8 x float> %b, <8 x float> %c)
	ret <8 x float> %t3			ret <8 x float> %t3

	}			}

	define <8 x float> @test8(float %a, <8 x float> %b, <8 x float> %c) {			define <8 x float> @test8(float %a, <8 x float> %b, <8 x float> %c) {
	; X32-LABEL: test8:			; X32-LABEL: test8:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: vbroadcastss {{[0-9]+}}(%esp), %ymm2			; X32-NEXT: vbroadcastss {{[0-9]+}}(%esp), %ymm2
	; X32-NEXT: vfnmadd213ps {{.#+}} ymm0 = -(ymm2 ymm0) + ymm1			; X32-NEXT: vfnmadd213ps {{.#+}} ymm0 = -(ymm2 ymm0) + ymm1
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test8:			; X64-LABEL: test8:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: vbroadcastss %xmm0, %ymm0			; X64-NEXT: vbroadcastss %xmm0, %ymm0
	; X64-NEXT: vfnmadd213ps {{.#+}} ymm0 = -(ymm1 ymm0) + ymm2			; X64-NEXT: vfnmadd213ps {{.#+}} ymm0 = -(ymm1 ymm0) + ymm2
	; X64-NEXT: retq			; X64-NEXT: retq
	%t0 = fsub float -0.0, %a			%t0 = fsub float -0.0, %a
	%t1 = insertelement <8 x float> undef, float %t0, i32 0			%t1 = insertelement <8 x float> undef, float %t0, i32 0
	%t2 = shufflevector <8 x float> %t1, <8 x float> undef, <8 x i32> zeroinitializer			%t2 = shufflevector <8 x float> %t1, <8 x float> undef, <8 x i32> zeroinitializer
	%t3 = tail call <8 x float> @llvm.fma.v8f32(<8 x float> %t2, <8 x float> %b, <8 x float> %c)			%t3 = tail call nsz <8 x float> @llvm.fma.v8f32(<8 x float> %t2, <8 x float> %b, <8 x float> %c)
	ret <8 x float> %t3			ret <8 x float> %t3
	}			}

llvm/test/CodeGen/X86/fma-fneg-combine.ll

	Show All 23 Lines
	; CHECK-NEXT: vfmsub213ps {{.#+}} zmm0 = (zmm1 zmm0) - zmm2			; CHECK-NEXT: vfmsub213ps {{.#+}} zmm0 = (zmm1 zmm0) - zmm2
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%sub.i = fneg <16 x float> %c			%sub.i = fneg <16 x float> %c
	%t0 = tail call <16 x float> @llvm.x86.avx512.vfmadd.ps.512(<16 x float> %a, <16 x float> %b, <16 x float> %sub.i, i32 4)			%t0 = tail call <16 x float> @llvm.x86.avx512.vfmadd.ps.512(<16 x float> %a, <16 x float> %b, <16 x float> %sub.i, i32 4)
	ret <16 x float> %t0			ret <16 x float> %t0
	}			}

	define <16 x float> @test2(<16 x float> %a, <16 x float> %b, <16 x float> %c) {			define <16 x float> @test2(<16 x float> %a, <16 x float> %b, <16 x float> %c) {
	; CHECK-LABEL: test2:			; SKX-LABEL: test2:
	; CHECK: # %bb.0:			; SKX: # %bb.0:
	; CHECK-NEXT: vfnmsub213ps {{.#+}} zmm0 = -(zmm1 zmm0) - zmm2			; SKX-NEXT: vfmadd213ps {{.#+}} zmm0 = (zmm1 zmm0) + zmm2
	; CHECK-NEXT: retq			; SKX-NEXT: vxorps {{.*}}(%rip){1to16}, %zmm0, %zmm0
				; SKX-NEXT: retq
				;
				; KNL-LABEL: test2:
				; KNL: # %bb.0:
				; KNL-NEXT: vfmadd213ps {{.#+}} zmm0 = (zmm1 zmm0) + zmm2
				; KNL-NEXT: vpxord {{.*}}(%rip){1to16}, %zmm0, %zmm0
				; KNL-NEXT: retq
	%fma = call <16 x float> @llvm.fma.v16f32(<16 x float> %a, <16 x float> %b, <16 x float> %c)			%fma = call <16 x float> @llvm.fma.v16f32(<16 x float> %a, <16 x float> %b, <16 x float> %c)
	%neg = fneg <16 x float> %fma			%neg = fneg <16 x float> %fma
	ret <16 x float> %neg			ret <16 x float> %neg
	}			}

	define <16 x float> @test2_nsz(<16 x float> %a, <16 x float> %b, <16 x float> %c) {			define <16 x float> @test2_nsz(<16 x float> %a, <16 x float> %b, <16 x float> %c) {
	; CHECK-LABEL: test2_nsz:			; CHECK-LABEL: test2_nsz:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vfnmsub213ps {{.#+}} zmm0 = -(zmm1 zmm0) - zmm2			; CHECK-NEXT: vfnmsub213ps {{.#+}} zmm0 = -(zmm1 zmm0) - zmm2
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%fma = call nsz <16 x float> @llvm.fma.v16f32(<16 x float> %a, <16 x float> %b, <16 x float> %c)			%fma = call nsz <16 x float> @llvm.fma.v16f32(<16 x float> %a, <16 x float> %b, <16 x float> %c)
	%neg = fneg <16 x float> %fma			%neg = fneg <16 x float> %fma
	ret <16 x float> %neg			ret <16 x float> %neg
	}			}

	define <16 x float> @test3(<16 x float> %a, <16 x float> %b, <16 x float> %c) {			define <16 x float> @test3(<16 x float> %a, <16 x float> %b, <16 x float> %c) {
	; CHECK-LABEL: test3:			; SKX-LABEL: test3:
	; CHECK: # %bb.0:			; SKX: # %bb.0:
	; CHECK-NEXT: vfmsub213ps {{.#+}} zmm0 = (zmm1 zmm0) - zmm2			; SKX-NEXT: vfnmadd213ps {{.#+}} zmm0 = -(zmm1 zmm0) + zmm2
	; CHECK-NEXT: retq			; SKX-NEXT: vxorps {{.*}}(%rip){1to16}, %zmm0, %zmm0
				; SKX-NEXT: retq
				;
				; KNL-LABEL: test3:
				; KNL: # %bb.0:
				; KNL-NEXT: vfnmadd213ps {{.#+}} zmm0 = -(zmm1 zmm0) + zmm2
				; KNL-NEXT: vpxord {{.*}}(%rip){1to16}, %zmm0, %zmm0
				; KNL-NEXT: retq
	%t0 = fneg <16 x float> %b			%t0 = fneg <16 x float> %b
	%t1 = call <16 x float> @llvm.fma.v16f32(<16 x float> %a, <16 x float> %t0, <16 x float> %c)			%t1 = call <16 x float> @llvm.fma.v16f32(<16 x float> %a, <16 x float> %t0, <16 x float> %c)
	%sub.i = fneg <16 x float> %t1			%sub.i = fneg <16 x float> %t1
	ret <16 x float> %sub.i			ret <16 x float> %sub.i
	}			}

	define <16 x float> @test3_nsz(<16 x float> %a, <16 x float> %b, <16 x float> %c) {			define <16 x float> @test3_nsz(<16 x float> %a, <16 x float> %b, <16 x float> %c) {
	; CHECK-LABEL: test3_nsz:			; CHECK-LABEL: test3_nsz:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vfmsub213ps {{.#+}} zmm0 = (zmm1 zmm0) - zmm2			; CHECK-NEXT: vfmsub213ps {{.#+}} zmm0 = (zmm1 zmm0) - zmm2
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t0 = fneg <16 x float> %b			%t0 = fneg <16 x float> %b
	%t1 = call nsz <16 x float> @llvm.fma.v16f32(<16 x float> %a, <16 x float> %t0, <16 x float> %c)			%t1 = call nsz <16 x float> @llvm.fma.v16f32(<16 x float> %a, <16 x float> %t0, <16 x float> %c)
	%sub.i = fneg <16 x float> %t1			%sub.i = fneg <16 x float> %t1
	ret <16 x float> %sub.i			ret <16 x float> %sub.i
	}			}

	define <16 x float> @test4(<16 x float> %a, <16 x float> %b, <16 x float> %c) {			define <16 x float> @test4(<16 x float> %a, <16 x float> %b, <16 x float> %c) {
	; CHECK-LABEL: test4:			; SKX-LABEL: test4:
	; CHECK: # %bb.0:			; SKX: # %bb.0:
	; CHECK-NEXT: vfmadd213ps {{.#+}} zmm0 = (zmm1 zmm0) + zmm2			; SKX-NEXT: vfnmsub213ps {{.#+}} zmm0 = -(zmm1 zmm0) - zmm2
	; CHECK-NEXT: retq			; SKX-NEXT: vxorps {{.*}}(%rip){1to16}, %zmm0, %zmm0
				; SKX-NEXT: retq
				;
				; KNL-LABEL: test4:
				; KNL: # %bb.0:
				; KNL-NEXT: vfnmsub213ps {{.#+}} zmm0 = -(zmm1 zmm0) - zmm2
				; KNL-NEXT: vpxord {{.*}}(%rip){1to16}, %zmm0, %zmm0
				; KNL-NEXT: retq
	%t0 = fneg <16 x float> %b			%t0 = fneg <16 x float> %b
	%t1 = fneg <16 x float> %c			%t1 = fneg <16 x float> %c
	%t2 = call <16 x float> @llvm.fma.v16f32(<16 x float> %a, <16 x float> %t0, <16 x float> %t1)			%t2 = call <16 x float> @llvm.fma.v16f32(<16 x float> %a, <16 x float> %t0, <16 x float> %t1)
	%sub.i = fsub <16 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %t2			%sub.i = fsub <16 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %t2
	ret <16 x float> %sub.i			ret <16 x float> %sub.i
	}			}

	define <16 x float> @test4_nsz(<16 x float> %a, <16 x float> %b, <16 x float> %c) {			define <16 x float> @test4_nsz(<16 x float> %a, <16 x float> %b, <16 x float> %c) {
	Show All 15 Lines
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%sub.i = fsub <16 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %c			%sub.i = fsub <16 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %c
	%0 = tail call <16 x float> @llvm.x86.avx512.vfmadd.ps.512(<16 x float> %a, <16 x float> %b, <16 x float> %sub.i, i32 10) #2			%0 = tail call <16 x float> @llvm.x86.avx512.vfmadd.ps.512(<16 x float> %a, <16 x float> %b, <16 x float> %sub.i, i32 10) #2
	ret <16 x float> %0			ret <16 x float> %0
	}			}

	define <16 x float> @test6(<16 x float> %a, <16 x float> %b, <16 x float> %c) {			define <16 x float> @test6(<16 x float> %a, <16 x float> %b, <16 x float> %c) {
	; CHECK-LABEL: test6:			; SKX-LABEL: test6:
	; CHECK: # %bb.0:			; SKX: # %bb.0:
	; CHECK-NEXT: vfmadd213ps {ru-sae}, %zmm2, %zmm1, %zmm0			; SKX-NEXT: vfnmsub213ps {ru-sae}, %zmm2, %zmm1, %zmm0
	; CHECK-NEXT: retq			; SKX-NEXT: vxorps {{.*}}(%rip){1to16}, %zmm0, %zmm0
				; SKX-NEXT: retq
				;
				; KNL-LABEL: test6:
				; KNL: # %bb.0:
				; KNL-NEXT: vfnmsub213ps {ru-sae}, %zmm2, %zmm1, %zmm0
				; KNL-NEXT: vpxord {{.*}}(%rip){1to16}, %zmm0, %zmm0
				; KNL-NEXT: retq
	%t0 = fneg <16 x float> %b			%t0 = fneg <16 x float> %b
	%t1 = fneg <16 x float> %c			%t1 = fneg <16 x float> %c
	%t2 = call <16 x float> @llvm.x86.avx512.vfmadd.ps.512(<16 x float> %a, <16 x float> %t0, <16 x float> %t1, i32 10)			%t2 = call <16 x float> @llvm.x86.avx512.vfmadd.ps.512(<16 x float> %a, <16 x float> %t0, <16 x float> %t1, i32 10)
	%sub.i = fsub <16 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %t2			%sub.i = fsub <16 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %t2
	ret <16 x float> %sub.i			ret <16 x float> %sub.i
	}			}

	define <16 x float> @test6_nsz(<16 x float> %a, <16 x float> %b, <16 x float> %c) {			define <16 x float> @test6_nsz(<16 x float> %a, <16 x float> %b, <16 x float> %c) {
	; CHECK-LABEL: test6_nsz:			; CHECK-LABEL: test6_nsz:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vfmadd213ps {ru-sae}, %zmm2, %zmm1, %zmm0			; CHECK-NEXT: vfmadd213ps {ru-sae}, %zmm2, %zmm1, %zmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t0 = fneg <16 x float> %b			%t0 = fneg <16 x float> %b
	%t1 = fneg <16 x float> %c			%t1 = fneg <16 x float> %c
	%t2 = call nsz <16 x float> @llvm.x86.avx512.vfmadd.ps.512(<16 x float> %a, <16 x float> %t0, <16 x float> %t1, i32 10)			%t2 = call nsz <16 x float> @llvm.x86.avx512.vfmadd.ps.512(<16 x float> %a, <16 x float> %t0, <16 x float> %t1, i32 10)
	%sub.i = fsub <16 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %t2			%sub.i = fsub <16 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %t2
	ret <16 x float> %sub.i			ret <16 x float> %sub.i
	}			}

	define <8 x float> @test7(<8 x float> %a, <8 x float> %b, <8 x float> %c) {			define <8 x float> @test7(<8 x float> %a, <8 x float> %b, <8 x float> %c) {
	; CHECK-LABEL: test7:			; SKX-LABEL: test7:
	; CHECK: # %bb.0:			; SKX: # %bb.0:
	; CHECK-NEXT: vfnmadd213ps {{.#+}} ymm0 = -(ymm1 ymm0) + ymm2			; SKX-NEXT: vfmsub213ps {{.#+}} ymm0 = (ymm1 ymm0) - ymm2
	; CHECK-NEXT: retq			; SKX-NEXT: vxorps {{.*}}(%rip){1to8}, %ymm0, %ymm0
				; SKX-NEXT: retq
				;
				; KNL-LABEL: test7:
				; KNL: # %bb.0:
				; KNL-NEXT: vfmsub213ps {{.#+}} ymm0 = (ymm1 ymm0) - ymm2
				; KNL-NEXT: vbroadcastss {{.*#+}} ymm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
				; KNL-NEXT: vxorps %ymm1, %ymm0, %ymm0
				; KNL-NEXT: retq
	%t0 = fneg <8 x float> %c			%t0 = fneg <8 x float> %c
	%t1 = call <8 x float> @llvm.fma.v8f32(<8 x float> %a, <8 x float> %b, <8 x float> %t0)			%t1 = call <8 x float> @llvm.fma.v8f32(<8 x float> %a, <8 x float> %b, <8 x float> %t0)
	%sub.i = fsub <8 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %t1			%sub.i = fsub <8 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %t1
	ret <8 x float> %sub.i			ret <8 x float> %sub.i
	}			}

	define <8 x float> @test7_nsz(<8 x float> %a, <8 x float> %b, <8 x float> %c) {			define <8 x float> @test7_nsz(<8 x float> %a, <8 x float> %b, <8 x float> %c) {
	; CHECK-LABEL: test7_nsz:			; CHECK-LABEL: test7_nsz:
	Show All 13 Lines
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%sub.c = fsub <8 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %c			%sub.c = fsub <8 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %c
	%0 = tail call <8 x float> @llvm.x86.fma.vfmsub.ps.256(<8 x float> %a, <8 x float> %b, <8 x float> %sub.c) #2			%0 = tail call <8 x float> @llvm.x86.fma.vfmsub.ps.256(<8 x float> %a, <8 x float> %b, <8 x float> %sub.c) #2
	ret <8 x float> %0			ret <8 x float> %0
	}			}

	define <8 x double> @test9(<8 x double> %a, <8 x double> %b, <8 x double> %c) {			define <8 x double> @test9(<8 x double> %a, <8 x double> %b, <8 x double> %c) {
	; CHECK-LABEL: test9:			; SKX-LABEL: test9:
	; CHECK: # %bb.0:			; SKX: # %bb.0:
	; CHECK-NEXT: vfnmsub213pd {{.#+}} zmm0 = -(zmm1 zmm0) - zmm2			; SKX-NEXT: vfmadd213pd {{.#+}} zmm0 = (zmm1 zmm0) + zmm2
	; CHECK-NEXT: retq			; SKX-NEXT: vxorpd {{.*}}(%rip){1to8}, %zmm0, %zmm0
				; SKX-NEXT: retq
				;
				; KNL-LABEL: test9:
				; KNL: # %bb.0:
				; KNL-NEXT: vfmadd213pd {{.#+}} zmm0 = (zmm1 zmm0) + zmm2
				; KNL-NEXT: vpxorq {{.*}}(%rip){1to8}, %zmm0, %zmm0
				; KNL-NEXT: retq
	%t0 = tail call <8 x double> @llvm.x86.avx512.vfmadd.pd.512(<8 x double> %a, <8 x double> %b, <8 x double> %c, i32 4)			%t0 = tail call <8 x double> @llvm.x86.avx512.vfmadd.pd.512(<8 x double> %a, <8 x double> %b, <8 x double> %c, i32 4)
	%sub.i = fneg <8 x double> %t0			%sub.i = fneg <8 x double> %t0
	ret <8 x double> %sub.i			ret <8 x double> %sub.i
	}			}

	define <8 x double> @test9_nsz(<8 x double> %a, <8 x double> %b, <8 x double> %c) {			define <8 x double> @test9_nsz(<8 x double> %a, <8 x double> %b, <8 x double> %c) {
	; CHECK-LABEL: test9_nsz:			; CHECK-LABEL: test9_nsz:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	▲ Show 20 Lines • Show All 339 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fma-signed-zero.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+fma \| FileCheck %s		; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+fma \| FileCheck %s

; This test checks that (fneg (fma (fneg x), y, (fneg z))) can't be folded to (fma x, y, z)		; This test checks that (fneg (fma (fneg x), y, (fneg z))) can't be folded to (fma x, y, z)
; without no signed zeros flag (nsz).		; without no signed zeros flag (nsz).

declare float @llvm.fma.f32(float, float, float)		declare float @llvm.fma.f32(float, float, float)

define float @fneg_fma32(float %x, float %y, float %z) {		define float @fneg_fma32(float %x, float %y, float %z) {
; CHECK-LABEL: fneg_fma32:		; CHECK-LABEL: fneg_fma32:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vfmadd213ss {{.#+}} xmm0 = (xmm1 xmm0) + xmm2		; CHECK-NEXT: vfnmsub213ss {{.#+}} xmm0 = -(xmm1 xmm0) - xmm2
		qiucfUnsubmitted Not Done Reply Inline Actions You can pre-commit (or at least stage them first and do git-diff) this test so that what optimizations are prevented is clear. qiucf: You can pre-commit (or at least stage them first and do git-diff) this test so that what…
		; CHECK-NEXT: vxorps {{.*}}(%rip), %xmm0, %xmm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%negx = fneg float %x		%negx = fneg float %x
%negz = fneg float %z		%negz = fneg float %z
%fma = call float @llvm.fma.f32(float %negx, float %y, float %negz)		%fma = call float @llvm.fma.f32(float %negx, float %y, float %negz)
%n = fneg float %fma		%n = fneg float %fma
ret float %n		ret float %n
}		}

Show All 9 Lines	; CHECK-NEXT: retq
ret float %n		ret float %n
}		}

declare double @llvm.fma.f64(double, double, double)		declare double @llvm.fma.f64(double, double, double)

define double @fneg_fma64(double %x, double %y, double %z) {		define double @fneg_fma64(double %x, double %y, double %z) {
; CHECK-LABEL: fneg_fma64:		; CHECK-LABEL: fneg_fma64:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vfmadd213sd {{.#+}} xmm0 = (xmm1 xmm0) + xmm2		; CHECK-NEXT: vfnmsub213sd {{.#+}} xmm0 = -(xmm1 xmm0) - xmm2
		; CHECK-NEXT: vxorpd {{.*}}(%rip), %xmm0, %xmm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%negx = fneg double %x		%negx = fneg double %x
%negz = fneg double %z		%negz = fneg double %z
%fma = call double @llvm.fma.f64(double %negx, double %y, double %negz)		%fma = call double @llvm.fma.f64(double %negx, double %y, double %negz)
%n = fneg double %fma		%n = fneg double %fma
ret double %n		ret double %n
}		}

Show All 11 Lines

llvm/test/CodeGen/X86/fma_patterns.ll

	Show First 20 Lines • Show All 1,302 Lines • ▼ Show 20 Lines
	; FMA4-NOINFS-NEXT: vfmsubss {{.#+}} xmm0 = (xmm0 xmm2) - xmm1			; FMA4-NOINFS-NEXT: vfmsubss {{.#+}} xmm0 = (xmm0 xmm2) - xmm1
	; FMA4-NOINFS-NEXT: retq			; FMA4-NOINFS-NEXT: retq
	;			;
	; AVX512-NOINFS-LABEL: test_f32_interp:			; AVX512-NOINFS-LABEL: test_f32_interp:
	; AVX512-NOINFS: # %bb.0:			; AVX512-NOINFS: # %bb.0:
	; AVX512-NOINFS-NEXT: vfmsub213ss {{.#+}} xmm1 = (xmm2 xmm1) - xmm1			; AVX512-NOINFS-NEXT: vfmsub213ss {{.#+}} xmm1 = (xmm2 xmm1) - xmm1
	; AVX512-NOINFS-NEXT: vfmsub213ss {{.#+}} xmm0 = (xmm2 xmm0) - xmm1			; AVX512-NOINFS-NEXT: vfmsub213ss {{.#+}} xmm0 = (xmm2 xmm0) - xmm1
	; AVX512-NOINFS-NEXT: retq			; AVX512-NOINFS-NEXT: retq
	%t1 = fsub float 1.0, %t			%t1 = fsub nsz float 1.0, %t
	%tx = fmul float %x, %t			%tx = fmul nsz float %x, %t
	%ty = fmul float %y, %t1			%ty = fmul nsz float %y, %t1
	%r = fadd float %tx, %ty			%r = fadd nsz float %tx, %ty
	ret float %r			ret float %r
	}			}

	define <4 x float> @test_v4f32_interp(<4 x float> %x, <4 x float> %y, <4 x float> %t) {			define <4 x float> @test_v4f32_interp(<4 x float> %x, <4 x float> %y, <4 x float> %t) {
	; FMA-INFS-LABEL: test_v4f32_interp:			; FMA-INFS-LABEL: test_v4f32_interp:
	; FMA-INFS: # %bb.0:			; FMA-INFS: # %bb.0:
	; FMA-INFS-NEXT: vmovaps {{.*#+}} xmm3 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0]			; FMA-INFS-NEXT: vmovaps {{.*#+}} xmm3 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0]
	; FMA-INFS-NEXT: vsubps %xmm2, %xmm3, %xmm3			; FMA-INFS-NEXT: vsubps %xmm2, %xmm3, %xmm3
	Show All 29 Lines
	; FMA4-NOINFS-NEXT: vfmsubps {{.#+}} xmm0 = (xmm0 xmm2) - xmm1			; FMA4-NOINFS-NEXT: vfmsubps {{.#+}} xmm0 = (xmm0 xmm2) - xmm1
	; FMA4-NOINFS-NEXT: retq			; FMA4-NOINFS-NEXT: retq
	;			;
	; AVX512-NOINFS-LABEL: test_v4f32_interp:			; AVX512-NOINFS-LABEL: test_v4f32_interp:
	; AVX512-NOINFS: # %bb.0:			; AVX512-NOINFS: # %bb.0:
	; AVX512-NOINFS-NEXT: vfmsub213ps {{.#+}} xmm1 = (xmm2 xmm1) - xmm1			; AVX512-NOINFS-NEXT: vfmsub213ps {{.#+}} xmm1 = (xmm2 xmm1) - xmm1
	; AVX512-NOINFS-NEXT: vfmsub213ps {{.#+}} xmm0 = (xmm2 xmm0) - xmm1			; AVX512-NOINFS-NEXT: vfmsub213ps {{.#+}} xmm0 = (xmm2 xmm0) - xmm1
	; AVX512-NOINFS-NEXT: retq			; AVX512-NOINFS-NEXT: retq
	%t1 = fsub <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, %t			%t1 = fsub nsz <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, %t
	%tx = fmul <4 x float> %x, %t			%tx = fmul nsz <4 x float> %x, %t
	%ty = fmul <4 x float> %y, %t1			%ty = fmul nsz <4 x float> %y, %t1
	%r = fadd <4 x float> %tx, %ty			%r = fadd nsz <4 x float> %tx, %ty
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <8 x float> @test_v8f32_interp(<8 x float> %x, <8 x float> %y, <8 x float> %t) {			define <8 x float> @test_v8f32_interp(<8 x float> %x, <8 x float> %y, <8 x float> %t) {
	; FMA-INFS-LABEL: test_v8f32_interp:			; FMA-INFS-LABEL: test_v8f32_interp:
	; FMA-INFS: # %bb.0:			; FMA-INFS: # %bb.0:
	; FMA-INFS-NEXT: vmovaps {{.*#+}} ymm3 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0]			; FMA-INFS-NEXT: vmovaps {{.*#+}} ymm3 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0]
	; FMA-INFS-NEXT: vsubps %ymm2, %ymm3, %ymm3			; FMA-INFS-NEXT: vsubps %ymm2, %ymm3, %ymm3
	Show All 29 Lines
	; FMA4-NOINFS-NEXT: vfmsubps {{.#+}} ymm0 = (ymm0 ymm2) - ymm1			; FMA4-NOINFS-NEXT: vfmsubps {{.#+}} ymm0 = (ymm0 ymm2) - ymm1
	; FMA4-NOINFS-NEXT: retq			; FMA4-NOINFS-NEXT: retq
	;			;
	; AVX512-NOINFS-LABEL: test_v8f32_interp:			; AVX512-NOINFS-LABEL: test_v8f32_interp:
	; AVX512-NOINFS: # %bb.0:			; AVX512-NOINFS: # %bb.0:
	; AVX512-NOINFS-NEXT: vfmsub213ps {{.#+}} ymm1 = (ymm2 ymm1) - ymm1			; AVX512-NOINFS-NEXT: vfmsub213ps {{.#+}} ymm1 = (ymm2 ymm1) - ymm1
	; AVX512-NOINFS-NEXT: vfmsub213ps {{.#+}} ymm0 = (ymm2 ymm0) - ymm1			; AVX512-NOINFS-NEXT: vfmsub213ps {{.#+}} ymm0 = (ymm2 ymm0) - ymm1
	; AVX512-NOINFS-NEXT: retq			; AVX512-NOINFS-NEXT: retq
	%t1 = fsub <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %t			%t1 = fsub nsz <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %t
	%tx = fmul <8 x float> %x, %t			%tx = fmul nsz <8 x float> %x, %t
	%ty = fmul <8 x float> %y, %t1			%ty = fmul nsz <8 x float> %y, %t1
	%r = fadd <8 x float> %tx, %ty			%r = fadd nsz <8 x float> %tx, %ty
	ret <8 x float> %r			ret <8 x float> %r
	}			}

	define double @test_f64_interp(double %x, double %y, double %t) {			define double @test_f64_interp(double %x, double %y, double %t) {
	; FMA-INFS-LABEL: test_f64_interp:			; FMA-INFS-LABEL: test_f64_interp:
	; FMA-INFS: # %bb.0:			; FMA-INFS: # %bb.0:
	; FMA-INFS-NEXT: vmovsd {{.*#+}} xmm3 = mem[0],zero			; FMA-INFS-NEXT: vmovsd {{.*#+}} xmm3 = mem[0],zero
	; FMA-INFS-NEXT: vsubsd %xmm2, %xmm3, %xmm3			; FMA-INFS-NEXT: vsubsd %xmm2, %xmm3, %xmm3
	Show All 29 Lines
	; FMA4-NOINFS-NEXT: vfmsubsd {{.#+}} xmm0 = (xmm0 xmm2) - xmm1			; FMA4-NOINFS-NEXT: vfmsubsd {{.#+}} xmm0 = (xmm0 xmm2) - xmm1
	; FMA4-NOINFS-NEXT: retq			; FMA4-NOINFS-NEXT: retq
	;			;
	; AVX512-NOINFS-LABEL: test_f64_interp:			; AVX512-NOINFS-LABEL: test_f64_interp:
	; AVX512-NOINFS: # %bb.0:			; AVX512-NOINFS: # %bb.0:
	; AVX512-NOINFS-NEXT: vfmsub213sd {{.#+}} xmm1 = (xmm2 xmm1) - xmm1			; AVX512-NOINFS-NEXT: vfmsub213sd {{.#+}} xmm1 = (xmm2 xmm1) - xmm1
	; AVX512-NOINFS-NEXT: vfmsub213sd {{.#+}} xmm0 = (xmm2 xmm0) - xmm1			; AVX512-NOINFS-NEXT: vfmsub213sd {{.#+}} xmm0 = (xmm2 xmm0) - xmm1
	; AVX512-NOINFS-NEXT: retq			; AVX512-NOINFS-NEXT: retq
	%t1 = fsub double 1.0, %t			%t1 = fsub nsz double 1.0, %t
	%tx = fmul double %x, %t			%tx = fmul nsz double %x, %t
	%ty = fmul double %y, %t1			%ty = fmul nsz double %y, %t1
	%r = fadd double %tx, %ty			%r = fadd nsz double %tx, %ty
	ret double %r			ret double %r
	}			}

	define <2 x double> @test_v2f64_interp(<2 x double> %x, <2 x double> %y, <2 x double> %t) {			define <2 x double> @test_v2f64_interp(<2 x double> %x, <2 x double> %y, <2 x double> %t) {
	; FMA-INFS-LABEL: test_v2f64_interp:			; FMA-INFS-LABEL: test_v2f64_interp:
	; FMA-INFS: # %bb.0:			; FMA-INFS: # %bb.0:
	; FMA-INFS-NEXT: vmovapd {{.*#+}} xmm3 = [1.0E+0,1.0E+0]			; FMA-INFS-NEXT: vmovapd {{.*#+}} xmm3 = [1.0E+0,1.0E+0]
	; FMA-INFS-NEXT: vsubpd %xmm2, %xmm3, %xmm3			; FMA-INFS-NEXT: vsubpd %xmm2, %xmm3, %xmm3
	Show All 29 Lines
	; FMA4-NOINFS-NEXT: vfmsubpd {{.#+}} xmm0 = (xmm0 xmm2) - xmm1			; FMA4-NOINFS-NEXT: vfmsubpd {{.#+}} xmm0 = (xmm0 xmm2) - xmm1
	; FMA4-NOINFS-NEXT: retq			; FMA4-NOINFS-NEXT: retq
	;			;
	; AVX512-NOINFS-LABEL: test_v2f64_interp:			; AVX512-NOINFS-LABEL: test_v2f64_interp:
	; AVX512-NOINFS: # %bb.0:			; AVX512-NOINFS: # %bb.0:
	; AVX512-NOINFS-NEXT: vfmsub213pd {{.#+}} xmm1 = (xmm2 xmm1) - xmm1			; AVX512-NOINFS-NEXT: vfmsub213pd {{.#+}} xmm1 = (xmm2 xmm1) - xmm1
	; AVX512-NOINFS-NEXT: vfmsub213pd {{.#+}} xmm0 = (xmm2 xmm0) - xmm1			; AVX512-NOINFS-NEXT: vfmsub213pd {{.#+}} xmm0 = (xmm2 xmm0) - xmm1
	; AVX512-NOINFS-NEXT: retq			; AVX512-NOINFS-NEXT: retq
	%t1 = fsub <2 x double> <double 1.0, double 1.0>, %t			%t1 = fsub nsz <2 x double> <double 1.0, double 1.0>, %t
	%tx = fmul <2 x double> %x, %t			%tx = fmul nsz <2 x double> %x, %t
	%ty = fmul <2 x double> %y, %t1			%ty = fmul nsz <2 x double> %y, %t1
	%r = fadd <2 x double> %tx, %ty			%r = fadd nsz <2 x double> %tx, %ty
	ret <2 x double> %r			ret <2 x double> %r
	}			}

	define <4 x double> @test_v4f64_interp(<4 x double> %x, <4 x double> %y, <4 x double> %t) {			define <4 x double> @test_v4f64_interp(<4 x double> %x, <4 x double> %y, <4 x double> %t) {
	; FMA-INFS-LABEL: test_v4f64_interp:			; FMA-INFS-LABEL: test_v4f64_interp:
	; FMA-INFS: # %bb.0:			; FMA-INFS: # %bb.0:
	; FMA-INFS-NEXT: vmovapd {{.*#+}} ymm3 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0]			; FMA-INFS-NEXT: vmovapd {{.*#+}} ymm3 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0]
	; FMA-INFS-NEXT: vsubpd %ymm2, %ymm3, %ymm3			; FMA-INFS-NEXT: vsubpd %ymm2, %ymm3, %ymm3
	Show All 29 Lines
	; FMA4-NOINFS-NEXT: vfmsubpd {{.#+}} ymm0 = (ymm0 ymm2) - ymm1			; FMA4-NOINFS-NEXT: vfmsubpd {{.#+}} ymm0 = (ymm0 ymm2) - ymm1
	; FMA4-NOINFS-NEXT: retq			; FMA4-NOINFS-NEXT: retq
	;			;
	; AVX512-NOINFS-LABEL: test_v4f64_interp:			; AVX512-NOINFS-LABEL: test_v4f64_interp:
	; AVX512-NOINFS: # %bb.0:			; AVX512-NOINFS: # %bb.0:
	; AVX512-NOINFS-NEXT: vfmsub213pd {{.#+}} ymm1 = (ymm2 ymm1) - ymm1			; AVX512-NOINFS-NEXT: vfmsub213pd {{.#+}} ymm1 = (ymm2 ymm1) - ymm1
	; AVX512-NOINFS-NEXT: vfmsub213pd {{.#+}} ymm0 = (ymm2 ymm0) - ymm1			; AVX512-NOINFS-NEXT: vfmsub213pd {{.#+}} ymm0 = (ymm2 ymm0) - ymm1
	; AVX512-NOINFS-NEXT: retq			; AVX512-NOINFS-NEXT: retq
	%t1 = fsub <4 x double> <double 1.0, double 1.0, double 1.0, double 1.0>, %t			%t1 = fsub nsz <4 x double> <double 1.0, double 1.0, double 1.0, double 1.0>, %t
	%tx = fmul <4 x double> %x, %t			%tx = fmul nsz <4 x double> %x, %t
	%ty = fmul <4 x double> %y, %t1			%ty = fmul nsz <4 x double> %y, %t1
	%r = fadd <4 x double> %tx, %ty			%r = fadd nsz <4 x double> %tx, %ty
	ret <4 x double> %r			ret <4 x double> %r
	}			}

	;			;
	; Pattern: (fneg (fma x, y, z)) -> (fma x, -y, -z)			; Pattern: (fneg (fma x, y, z)) -> (fma x, -y, -z)
	;			;

	define <4 x float> @test_v4f32_fneg_fmadd(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) #0 {			define <4 x float> @test_v4f32_fneg_fmadd(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) #0 {
	; FMA-LABEL: test_v4f32_fneg_fmadd:			; FMA-LABEL: test_v4f32_fneg_fmadd:
	; FMA: # %bb.0:			; FMA: # %bb.0:
	; FMA-NEXT: vfnmsub213ps {{.#+}} xmm0 = -(xmm1 xmm0) - xmm2			; FMA-NEXT: vfnmsub213ps {{.#+}} xmm0 = -(xmm1 xmm0) - xmm2
	; FMA-NEXT: retq			; FMA-NEXT: retq
	;			;
	; FMA4-LABEL: test_v4f32_fneg_fmadd:			; FMA4-LABEL: test_v4f32_fneg_fmadd:
	; FMA4: # %bb.0:			; FMA4: # %bb.0:
	; FMA4-NEXT: vfnmsubps {{.#+}} xmm0 = -(xmm0 xmm1) - xmm2			; FMA4-NEXT: vfnmsubps {{.#+}} xmm0 = -(xmm0 xmm1) - xmm2
	; FMA4-NEXT: retq			; FMA4-NEXT: retq
	;			;
	; AVX512-LABEL: test_v4f32_fneg_fmadd:			; AVX512-LABEL: test_v4f32_fneg_fmadd:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vfnmsub213ps {{.#+}} xmm0 = -(xmm1 xmm0) - xmm2			; AVX512-NEXT: vfnmsub213ps {{.#+}} xmm0 = -(xmm1 xmm0) - xmm2
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%mul = fmul <4 x float> %a0, %a1			%mul = fmul nsz <4 x float> %a0, %a1
	%add = fadd <4 x float> %mul, %a2			%add = fadd nsz <4 x float> %mul, %a2
	%neg = fsub <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %add			%neg = fsub nsz <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %add
	ret <4 x float> %neg			ret <4 x float> %neg
	}			}

	define <4 x double> @test_v4f64_fneg_fmsub(<4 x double> %a0, <4 x double> %a1, <4 x double> %a2) #0 {			define <4 x double> @test_v4f64_fneg_fmsub(<4 x double> %a0, <4 x double> %a1, <4 x double> %a2) #0 {
	; FMA-LABEL: test_v4f64_fneg_fmsub:			; FMA-LABEL: test_v4f64_fneg_fmsub:
	; FMA: # %bb.0:			; FMA: # %bb.0:
	; FMA-NEXT: vfnmadd213pd {{.#+}} ymm0 = -(ymm1 ymm0) + ymm2			; FMA-NEXT: vfnmadd213pd {{.#+}} ymm0 = -(ymm1 ymm0) + ymm2
	; FMA-NEXT: retq			; FMA-NEXT: retq
	;			;
	; FMA4-LABEL: test_v4f64_fneg_fmsub:			; FMA4-LABEL: test_v4f64_fneg_fmsub:
	; FMA4: # %bb.0:			; FMA4: # %bb.0:
	; FMA4-NEXT: vfnmaddpd {{.#+}} ymm0 = -(ymm0 ymm1) + ymm2			; FMA4-NEXT: vfnmaddpd {{.#+}} ymm0 = -(ymm0 ymm1) + ymm2
	; FMA4-NEXT: retq			; FMA4-NEXT: retq
	;			;
	; AVX512-LABEL: test_v4f64_fneg_fmsub:			; AVX512-LABEL: test_v4f64_fneg_fmsub:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vfnmadd213pd {{.#+}} ymm0 = -(ymm1 ymm0) + ymm2			; AVX512-NEXT: vfnmadd213pd {{.#+}} ymm0 = -(ymm1 ymm0) + ymm2
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%mul = fmul <4 x double> %a0, %a1			%mul = fmul nsz <4 x double> %a0, %a1
	%sub = fsub <4 x double> %mul, %a2			%sub = fsub nsz <4 x double> %mul, %a2
	%neg = fsub <4 x double> <double -0.0, double -0.0, double -0.0, double -0.0>, %sub			%neg = fsub nsz <4 x double> <double -0.0, double -0.0, double -0.0, double -0.0>, %sub
	ret <4 x double> %neg			ret <4 x double> %neg
	}			}

	define <4 x float> @test_v4f32_fneg_fnmadd(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) #0 {			define <4 x float> @test_v4f32_fneg_fnmadd(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) #0 {
	; FMA-LABEL: test_v4f32_fneg_fnmadd:			; FMA-LABEL: test_v4f32_fneg_fnmadd:
	; FMA: # %bb.0:			; FMA: # %bb.0:
	; FMA-NEXT: vfmsub213ps {{.#+}} xmm0 = (xmm1 xmm0) - xmm2			; FMA-NEXT: vfmsub213ps {{.#+}} xmm0 = (xmm1 xmm0) - xmm2
	; FMA-NEXT: retq			; FMA-NEXT: retq
	;			;
	; FMA4-LABEL: test_v4f32_fneg_fnmadd:			; FMA4-LABEL: test_v4f32_fneg_fnmadd:
	; FMA4: # %bb.0:			; FMA4: # %bb.0:
	; FMA4-NEXT: vfmsubps {{.#+}} xmm0 = (xmm0 xmm1) - xmm2			; FMA4-NEXT: vfmsubps {{.#+}} xmm0 = (xmm0 xmm1) - xmm2
	; FMA4-NEXT: retq			; FMA4-NEXT: retq
	;			;
	; AVX512-LABEL: test_v4f32_fneg_fnmadd:			; AVX512-LABEL: test_v4f32_fneg_fnmadd:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vfmsub213ps {{.#+}} xmm0 = (xmm1 xmm0) - xmm2			; AVX512-NEXT: vfmsub213ps {{.#+}} xmm0 = (xmm1 xmm0) - xmm2
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%mul = fmul <4 x float> %a0, %a1			%mul = fmul nsz <4 x float> %a0, %a1
	%neg0 = fsub <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %mul			%neg0 = fsub nsz <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %mul
	%add = fadd <4 x float> %neg0, %a2			%add = fadd nsz <4 x float> %neg0, %a2
	%neg1 = fsub <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %add			%neg1 = fsub nsz <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %add
	ret <4 x float> %neg1			ret <4 x float> %neg1
	}			}

	define <4 x double> @test_v4f64_fneg_fnmsub(<4 x double> %a0, <4 x double> %a1, <4 x double> %a2) #0 {			define <4 x double> @test_v4f64_fneg_fnmsub(<4 x double> %a0, <4 x double> %a1, <4 x double> %a2) #0 {
	; FMA-LABEL: test_v4f64_fneg_fnmsub:			; FMA-LABEL: test_v4f64_fneg_fnmsub:
	; FMA: # %bb.0:			; FMA: # %bb.0:
	; FMA-NEXT: vfmadd213pd {{.#+}} ymm0 = (ymm1 ymm0) + ymm2			; FMA-NEXT: vfmadd213pd {{.#+}} ymm0 = (ymm1 ymm0) + ymm2
	; FMA-NEXT: retq			; FMA-NEXT: retq
	;			;
	; FMA4-LABEL: test_v4f64_fneg_fnmsub:			; FMA4-LABEL: test_v4f64_fneg_fnmsub:
	; FMA4: # %bb.0:			; FMA4: # %bb.0:
	; FMA4-NEXT: vfmaddpd {{.#+}} ymm0 = (ymm0 ymm1) + ymm2			; FMA4-NEXT: vfmaddpd {{.#+}} ymm0 = (ymm0 ymm1) + ymm2
	; FMA4-NEXT: retq			; FMA4-NEXT: retq
	;			;
	; AVX512-LABEL: test_v4f64_fneg_fnmsub:			; AVX512-LABEL: test_v4f64_fneg_fnmsub:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vfmadd213pd {{.#+}} ymm0 = (ymm1 ymm0) + ymm2			; AVX512-NEXT: vfmadd213pd {{.#+}} ymm0 = (ymm1 ymm0) + ymm2
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%mul = fmul <4 x double> %a0, %a1			%mul = fmul nsz <4 x double> %a0, %a1
	%neg0 = fsub <4 x double> <double -0.0, double -0.0, double -0.0, double -0.0>, %mul			%neg0 = fsub nsz <4 x double> <double -0.0, double -0.0, double -0.0, double -0.0>, %mul
	%sub = fsub <4 x double> %neg0, %a2			%sub = fsub nsz <4 x double> %neg0, %a2
	%neg1 = fsub <4 x double> <double -0.0, double -0.0, double -0.0, double -0.0>, %sub			%neg1 = fsub nsz <4 x double> <double -0.0, double -0.0, double -0.0, double -0.0>, %sub
	ret <4 x double> %neg1			ret <4 x double> %neg1
	}			}

	;			;
	; Pattern: (fma x, c1, (fmul x, c2)) -> (fmul x, c1+c2)			; Pattern: (fma x, c1, (fmul x, c2)) -> (fmul x, c1+c2)
	;			;

	define <4 x float> @test_v4f32_fma_x_c1_fmul_x_c2(<4 x float> %x) #0 {			define <4 x float> @test_v4f32_fma_x_c1_fmul_x_c2(<4 x float> %x) #0 {
	▲ Show 20 Lines • Show All 366 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fma_patterns_wide.ll

	Show First 20 Lines • Show All 862 Lines • ▼ Show 20 Lines
	; FMA4-NOINFS-NEXT: vfmsubps {{.#+}} ymm1 = (ymm1 ymm5) - ymm3			; FMA4-NOINFS-NEXT: vfmsubps {{.#+}} ymm1 = (ymm1 ymm5) - ymm3
	; FMA4-NOINFS-NEXT: retq			; FMA4-NOINFS-NEXT: retq
	;			;
	; AVX512-NOINFS-LABEL: test_v16f32_interp:			; AVX512-NOINFS-LABEL: test_v16f32_interp:
	; AVX512-NOINFS: # %bb.0:			; AVX512-NOINFS: # %bb.0:
	; AVX512-NOINFS-NEXT: vfmsub213ps {{.#+}} zmm1 = (zmm2 zmm1) - zmm1			; AVX512-NOINFS-NEXT: vfmsub213ps {{.#+}} zmm1 = (zmm2 zmm1) - zmm1
	; AVX512-NOINFS-NEXT: vfmsub213ps {{.#+}} zmm0 = (zmm2 zmm0) - zmm1			; AVX512-NOINFS-NEXT: vfmsub213ps {{.#+}} zmm0 = (zmm2 zmm0) - zmm1
	; AVX512-NOINFS-NEXT: retq			; AVX512-NOINFS-NEXT: retq
	%t1 = fsub <16 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %t			%t1 = fsub nsz <16 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %t
	%tx = fmul <16 x float> %x, %t			%tx = fmul nsz <16 x float> %x, %t
	%ty = fmul <16 x float> %y, %t1			%ty = fmul nsz <16 x float> %y, %t1
	%r = fadd <16 x float> %tx, %ty			%r = fadd nsz <16 x float> %tx, %ty
	ret <16 x float> %r			ret <16 x float> %r
	}			}

	define <8 x double> @test_v8f64_interp(<8 x double> %x, <8 x double> %y, <8 x double> %t) {			define <8 x double> @test_v8f64_interp(<8 x double> %x, <8 x double> %y, <8 x double> %t) {
	; FMA-INFS-LABEL: test_v8f64_interp:			; FMA-INFS-LABEL: test_v8f64_interp:
	; FMA-INFS: # %bb.0:			; FMA-INFS: # %bb.0:
	; FMA-INFS-NEXT: vmovapd {{.*#+}} ymm6 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0]			; FMA-INFS-NEXT: vmovapd {{.*#+}} ymm6 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0]
	; FMA-INFS-NEXT: vsubpd %ymm4, %ymm6, %ymm7			; FMA-INFS-NEXT: vsubpd %ymm4, %ymm6, %ymm7
	Show All 39 Lines
	; FMA4-NOINFS-NEXT: vfmsubpd {{.#+}} ymm1 = (ymm1 ymm5) - ymm3			; FMA4-NOINFS-NEXT: vfmsubpd {{.#+}} ymm1 = (ymm1 ymm5) - ymm3
	; FMA4-NOINFS-NEXT: retq			; FMA4-NOINFS-NEXT: retq
	;			;
	; AVX512-NOINFS-LABEL: test_v8f64_interp:			; AVX512-NOINFS-LABEL: test_v8f64_interp:
	; AVX512-NOINFS: # %bb.0:			; AVX512-NOINFS: # %bb.0:
	; AVX512-NOINFS-NEXT: vfmsub213pd {{.#+}} zmm1 = (zmm2 zmm1) - zmm1			; AVX512-NOINFS-NEXT: vfmsub213pd {{.#+}} zmm1 = (zmm2 zmm1) - zmm1
	; AVX512-NOINFS-NEXT: vfmsub213pd {{.#+}} zmm0 = (zmm2 zmm0) - zmm1			; AVX512-NOINFS-NEXT: vfmsub213pd {{.#+}} zmm0 = (zmm2 zmm0) - zmm1
	; AVX512-NOINFS-NEXT: retq			; AVX512-NOINFS-NEXT: retq
	%t1 = fsub <8 x double> <double 1.0, double 1.0, double 1.0, double 1.0, double 1.0, double 1.0, double 1.0, double 1.0>, %t			%t1 = fsub nsz <8 x double> <double 1.0, double 1.0, double 1.0, double 1.0, double 1.0, double 1.0, double 1.0, double 1.0>, %t
	%tx = fmul <8 x double> %x, %t			%tx = fmul nsz <8 x double> %x, %t
	%ty = fmul <8 x double> %y, %t1			%ty = fmul nsz <8 x double> %y, %t1
	%r = fadd <8 x double> %tx, %ty			%r = fadd nsz <8 x double> %tx, %ty
	ret <8 x double> %r			ret <8 x double> %r
	}			}

	;			;
	; Pattern: (fneg (fma x, y, z)) -> (fma x, -y, -z)			; Pattern: (fneg (fma x, y, z)) -> (fma x, -y, -z)
	;			;

	define <16 x float> @test_v16f32_fneg_fmadd(<16 x float> %a0, <16 x float> %a1, <16 x float> %a2) #0 {			define <16 x float> @test_v16f32_fneg_fmadd(<16 x float> %a0, <16 x float> %a1, <16 x float> %a2) #0 {
	; FMA-LABEL: test_v16f32_fneg_fmadd:			; FMA-LABEL: test_v16f32_fneg_fmadd:
	; FMA: # %bb.0:			; FMA: # %bb.0:
	; FMA-NEXT: vfnmsub213ps {{.#+}} ymm0 = -(ymm2 ymm0) - ymm4			; FMA-NEXT: vfnmsub213ps {{.#+}} ymm0 = -(ymm2 ymm0) - ymm4
	; FMA-NEXT: vfnmsub213ps {{.#+}} ymm1 = -(ymm3 ymm1) - ymm5			; FMA-NEXT: vfnmsub213ps {{.#+}} ymm1 = -(ymm3 ymm1) - ymm5
	; FMA-NEXT: retq			; FMA-NEXT: retq
	;			;
	; FMA4-LABEL: test_v16f32_fneg_fmadd:			; FMA4-LABEL: test_v16f32_fneg_fmadd:
	; FMA4: # %bb.0:			; FMA4: # %bb.0:
	; FMA4-NEXT: vfnmsubps {{.#+}} ymm0 = -(ymm0 ymm2) - ymm4			; FMA4-NEXT: vfnmsubps {{.#+}} ymm0 = -(ymm0 ymm2) - ymm4
	; FMA4-NEXT: vfnmsubps {{.#+}} ymm1 = -(ymm1 ymm3) - ymm5			; FMA4-NEXT: vfnmsubps {{.#+}} ymm1 = -(ymm1 ymm3) - ymm5
	; FMA4-NEXT: retq			; FMA4-NEXT: retq
	;			;
	; AVX512-LABEL: test_v16f32_fneg_fmadd:			; AVX512-LABEL: test_v16f32_fneg_fmadd:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vfnmsub213ps {{.#+}} zmm0 = -(zmm1 zmm0) - zmm2			; AVX512-NEXT: vfnmsub213ps {{.#+}} zmm0 = -(zmm1 zmm0) - zmm2
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%mul = fmul <16 x float> %a0, %a1			%mul = fmul nsz <16 x float> %a0, %a1
	%add = fadd <16 x float> %mul, %a2			%add = fadd nsz <16 x float> %mul, %a2
	%neg = fsub <16 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %add			%neg = fsub nsz <16 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %add
	ret <16 x float> %neg			ret <16 x float> %neg
	}			}

	define <8 x double> @test_v8f64_fneg_fmsub(<8 x double> %a0, <8 x double> %a1, <8 x double> %a2) #0 {			define <8 x double> @test_v8f64_fneg_fmsub(<8 x double> %a0, <8 x double> %a1, <8 x double> %a2) #0 {
	; FMA-LABEL: test_v8f64_fneg_fmsub:			; FMA-LABEL: test_v8f64_fneg_fmsub:
	; FMA: # %bb.0:			; FMA: # %bb.0:
	; FMA-NEXT: vfnmadd213pd {{.#+}} ymm0 = -(ymm2 ymm0) + ymm4			; FMA-NEXT: vfnmadd213pd {{.#+}} ymm0 = -(ymm2 ymm0) + ymm4
	; FMA-NEXT: vfnmadd213pd {{.#+}} ymm1 = -(ymm3 ymm1) + ymm5			; FMA-NEXT: vfnmadd213pd {{.#+}} ymm1 = -(ymm3 ymm1) + ymm5
	; FMA-NEXT: retq			; FMA-NEXT: retq
	;			;
	; FMA4-LABEL: test_v8f64_fneg_fmsub:			; FMA4-LABEL: test_v8f64_fneg_fmsub:
	; FMA4: # %bb.0:			; FMA4: # %bb.0:
	; FMA4-NEXT: vfnmaddpd {{.#+}} ymm0 = -(ymm0 ymm2) + ymm4			; FMA4-NEXT: vfnmaddpd {{.#+}} ymm0 = -(ymm0 ymm2) + ymm4
	; FMA4-NEXT: vfnmaddpd {{.#+}} ymm1 = -(ymm1 ymm3) + ymm5			; FMA4-NEXT: vfnmaddpd {{.#+}} ymm1 = -(ymm1 ymm3) + ymm5
	; FMA4-NEXT: retq			; FMA4-NEXT: retq
	;			;
	; AVX512-LABEL: test_v8f64_fneg_fmsub:			; AVX512-LABEL: test_v8f64_fneg_fmsub:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vfnmadd213pd {{.#+}} zmm0 = -(zmm1 zmm0) + zmm2			; AVX512-NEXT: vfnmadd213pd {{.#+}} zmm0 = -(zmm1 zmm0) + zmm2
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%mul = fmul <8 x double> %a0, %a1			%mul = fmul nsz <8 x double> %a0, %a1
	%sub = fsub <8 x double> %mul, %a2			%sub = fsub nsz <8 x double> %mul, %a2
	%neg = fsub <8 x double> <double -0.0, double -0.0, double -0.0, double -0.0, double -0.0, double -0.0, double -0.0, double -0.0>, %sub			%neg = fsub nsz <8 x double> <double -0.0, double -0.0, double -0.0, double -0.0, double -0.0, double -0.0, double -0.0, double -0.0>, %sub
	ret <8 x double> %neg			ret <8 x double> %neg
	}			}

	define <16 x float> @test_v16f32_fneg_fnmadd(<16 x float> %a0, <16 x float> %a1, <16 x float> %a2) #0 {			define <16 x float> @test_v16f32_fneg_fnmadd(<16 x float> %a0, <16 x float> %a1, <16 x float> %a2) #0 {
	; FMA-LABEL: test_v16f32_fneg_fnmadd:			; FMA-LABEL: test_v16f32_fneg_fnmadd:
	; FMA: # %bb.0:			; FMA: # %bb.0:
	; FMA-NEXT: vfmsub213ps {{.#+}} ymm0 = (ymm2 ymm0) - ymm4			; FMA-NEXT: vfmsub213ps {{.#+}} ymm0 = (ymm2 ymm0) - ymm4
	; FMA-NEXT: vfmsub213ps {{.#+}} ymm1 = (ymm3 ymm1) - ymm5			; FMA-NEXT: vfmsub213ps {{.#+}} ymm1 = (ymm3 ymm1) - ymm5
	; FMA-NEXT: retq			; FMA-NEXT: retq
	;			;
	; FMA4-LABEL: test_v16f32_fneg_fnmadd:			; FMA4-LABEL: test_v16f32_fneg_fnmadd:
	; FMA4: # %bb.0:			; FMA4: # %bb.0:
	; FMA4-NEXT: vfmsubps {{.#+}} ymm0 = (ymm0 ymm2) - ymm4			; FMA4-NEXT: vfmsubps {{.#+}} ymm0 = (ymm0 ymm2) - ymm4
	; FMA4-NEXT: vfmsubps {{.#+}} ymm1 = (ymm1 ymm3) - ymm5			; FMA4-NEXT: vfmsubps {{.#+}} ymm1 = (ymm1 ymm3) - ymm5
	; FMA4-NEXT: retq			; FMA4-NEXT: retq
	;			;
	; AVX512-LABEL: test_v16f32_fneg_fnmadd:			; AVX512-LABEL: test_v16f32_fneg_fnmadd:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vfmsub213ps {{.#+}} zmm0 = (zmm1 zmm0) - zmm2			; AVX512-NEXT: vfmsub213ps {{.#+}} zmm0 = (zmm1 zmm0) - zmm2
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%mul = fmul <16 x float> %a0, %a1			%mul = fmul nsz <16 x float> %a0, %a1
	%neg0 = fsub <16 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %mul			%neg0 = fsub nsz <16 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %mul
	%add = fadd <16 x float> %neg0, %a2			%add = fadd nsz <16 x float> %neg0, %a2
	%neg1 = fsub <16 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %add			%neg1 = fsub nsz <16 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, %add
	ret <16 x float> %neg1			ret <16 x float> %neg1
	}			}

	define <8 x double> @test_v8f64_fneg_fnmsub(<8 x double> %a0, <8 x double> %a1, <8 x double> %a2) #0 {			define <8 x double> @test_v8f64_fneg_fnmsub(<8 x double> %a0, <8 x double> %a1, <8 x double> %a2) #0 {
	; FMA-LABEL: test_v8f64_fneg_fnmsub:			; FMA-LABEL: test_v8f64_fneg_fnmsub:
	; FMA: # %bb.0:			; FMA: # %bb.0:
	; FMA-NEXT: vfmadd213pd {{.#+}} ymm0 = (ymm2 ymm0) + ymm4			; FMA-NEXT: vfmadd213pd {{.#+}} ymm0 = (ymm2 ymm0) + ymm4
	; FMA-NEXT: vfmadd213pd {{.#+}} ymm1 = (ymm3 ymm1) + ymm5			; FMA-NEXT: vfmadd213pd {{.#+}} ymm1 = (ymm3 ymm1) + ymm5
	; FMA-NEXT: retq			; FMA-NEXT: retq
	;			;
	; FMA4-LABEL: test_v8f64_fneg_fnmsub:			; FMA4-LABEL: test_v8f64_fneg_fnmsub:
	; FMA4: # %bb.0:			; FMA4: # %bb.0:
	; FMA4-NEXT: vfmaddpd {{.#+}} ymm0 = (ymm0 ymm2) + ymm4			; FMA4-NEXT: vfmaddpd {{.#+}} ymm0 = (ymm0 ymm2) + ymm4
	; FMA4-NEXT: vfmaddpd {{.#+}} ymm1 = (ymm1 ymm3) + ymm5			; FMA4-NEXT: vfmaddpd {{.#+}} ymm1 = (ymm1 ymm3) + ymm5
	; FMA4-NEXT: retq			; FMA4-NEXT: retq
	;			;
	; AVX512-LABEL: test_v8f64_fneg_fnmsub:			; AVX512-LABEL: test_v8f64_fneg_fnmsub:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vfmadd213pd {{.#+}} zmm0 = (zmm1 zmm0) + zmm2			; AVX512-NEXT: vfmadd213pd {{.#+}} zmm0 = (zmm1 zmm0) + zmm2
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%mul = fmul <8 x double> %a0, %a1			%mul = fmul nsz <8 x double> %a0, %a1
	%neg0 = fsub <8 x double> <double -0.0, double -0.0, double -0.0, double -0.0, double -0.0, double -0.0, double -0.0, double -0.0>, %mul			%neg0 = fsub nsz <8 x double> <double -0.0, double -0.0, double -0.0, double -0.0, double -0.0, double -0.0, double -0.0, double -0.0>, %mul
	%sub = fsub <8 x double> %neg0, %a2			%sub = fsub nsz <8 x double> %neg0, %a2
	%neg1 = fsub <8 x double> <double -0.0, double -0.0, double -0.0, double -0.0, double -0.0, double -0.0, double -0.0, double -0.0>, %sub			%neg1 = fsub nsz <8 x double> <double -0.0, double -0.0, double -0.0, double -0.0, double -0.0, double -0.0, double -0.0, double -0.0>, %sub
	ret <8 x double> %neg1			ret <8 x double> %neg1
	}			}

	;			;
	; Pattern: (fma x, c1, (fmul x, c2)) -> (fmul x, c1+c2)			; Pattern: (fma x, c1, (fmul x, c2)) -> (fmul x, c1+c2)
	;			;

	define <16 x float> @test_v16f32_fma_x_c1_fmul_x_c2(<16 x float> %x) #0 {			define <16 x float> @test_v16f32_fma_x_c1_fmul_x_c2(<16 x float> %x) #0 {
	▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Don't fold (fneg (fma (fneg X), Y, (fneg Z))) to (fma X, Y, Z)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 347033

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/X86/avx2-fma-fneg-combine.ll

llvm/test/CodeGen/X86/fma-fneg-combine.ll

llvm/test/CodeGen/X86/fma-signed-zero.ll

llvm/test/CodeGen/X86/fma_patterns.ll

llvm/test/CodeGen/X86/fma_patterns_wide.ll

[X86] Don't fold (fneg (fma (fneg X), Y, (fneg Z))) to (fma X, Y, Z)
ClosedPublic