This is an archive of the discontinued LLVM Phabricator instance.

X86-FMA3: Implemented commute transformations for FMA*_Int instructions
ClosedPublic

Authored by v_klochkov on Nov 10 2015, 12:43 PM.

Download Raw Diff

Details

Reviewers

qcolombet
DavidKreitzer

Commits

rGcbc56baae6bf: X86-FMA3: Implemented commute transformations FMA*_Int instructions. It made it…
rL252973: X86-FMA3: Implemented commute transformations FMA*_Int instructions.

Summary

Hello,

Please review the patch that implements the commute transformations for
X86-FMA3 FMA*_Int opcodes (i.e. opcodes generated only for scalar FMA intrinsics).

Previously, the commute transformation was implemented for all FMA3 instructions
except FMA*_Int. Please see ( D13269 ) for details.
So, this change-set is mostly a minor tuning/update of the optimization introduced in ( D13269 ).

X86InstrFMA.td:
Set the 'isCommuteble' attribute to 1 for FMA*_Int opcodes.

X86InstrInfo.cpp:
Added FMA*_Int opcodes to isFMA3() routine.
Added a table containing FMA*_Int in groups of three opcodes in each group(132, 213, 231).

fma-commute-x86.ll:
Tightened the checks.
Changed the FMA opcode generated for scalar intrincis on Windows.
The generated code is different now because previously the FMA*_INt instructions
were not commutable. Now they are.
PeepholeOptimizer tries to do memory folding of operands starting from the 1st operand.
The 1st operand cannot be commuted with 3rd (foldable) operand as it chagnes the intrinsic result.
The 2nd operand can be commuted with 3rd. So, it was commuted. The 2nd operand became 3rd
(that also required the opcode changes from 213 to 132); then the operand was folded
with the load.

fma-commute-x86.ll:
Added some test cases for scalar intrinsics.

Thank you,
Slava

Diff Detail

Repository: rL LLVM

Event Timeline

v_klochkov updated this revision to Diff 39844.Nov 10 2015, 12:43 PM

v_klochkov retitled this revision from to X86-FMA3: Implemented commute transformations for FMA*_Int instructions.

v_klochkov updated this object.

v_klochkov added a reviewer: DavidKreitzer.

v_klochkov added subscribers: llvm-commits, qcolombet.

Hi Slava,

I haven’t looked at the test cases closely yet, but I think the patch is pretty good.
I am suggesting one refactoring to avoid some code duplication.

When the refactoring is done, I’ll look closer into the test cases.

Thanks for working on this!

-Quentin

llvm/lib/Target/X86/X86InstrInfo.cpp
2976 ↗	(On Diff #39844)	Add an optional parameter: bool *IsIntrinsic = nullptr. Set it to false at the beginning of the function.
2994 ↗	(On Diff #39844)	Put all the intrinsic opcodes together to set the boolean to true and fall through to the non intrinsic cases.
3502 ↗	(On Diff #39844)	At first, I was believing the code would remain simpler if we just merge this table with the existing one. I was guessing we were doing that because we want to know that the opcode is an intrinsic. Although that is fair, I don’t like the code duplication this implies. Now, thinking about it, splitting the table is fine, but we can style avoid the code duplication. Indeed, let say we have a method that tells us before hand that an opcode is an intrinsic. Using this knowledge, we can: Make the search only in the appropriate table. Reuse the same code for both tables, this is just a matter of setting the boundaries correctly. For instance, we could add an optional parameter to isFMA3 that says whether or not the opcode is an intrinsic, then update GroupsNum and some new OpcodeGroups like variable.
3513 ↗	(On Diff #39844)	Call isFMA3 to get the intrinsic information. Set a new OpcodeGroups array to either the Intrinsic opcode array or the regular opcode array. Set GroupsNum accordingly.
3538 ↗	(On Diff #39844)	We wouldn’t need that loop anymore.
3541 ↗	(On Diff #39844)	This change is not required anymore.

Did additional refactoring suggested by Quentin:

added an optional parameter IsIntrinsic to isFMA3()
removed the duplicating code (loop) from getFMA3OpcodeToCommuteOperands().

Hi Quentin,

Thank you for the so quick code-review.
I did the refactoring your suggested to do.
Please see the additional changes.

Thank you,
Slava

llvm/lib/Target/X86/X86InstrInfo.cpp
2976 ↗	(On Diff #39844)	Ok, done.
3502 ↗	(On Diff #39844)	Ok, good idea, thank you. I did the proposed additional changes.
3541 ↗	(On Diff #39844)	The search loop must find a group of 3 opcodes because this routine is called only after isFMA3() check. So, technically 'return 0' statement is not reachable now. I did not remove this check just for additional safety, for example, if someone else adds more opcodes to isFMA3(), but does not add them to the opcode groups defined in this routine. In such scenario it is better to just return 0.

Hi Slava,

LGTM with a couple of comments.
Feel free to commit whenever you want.

Cheers,
-Quentin

llvm/lib/Target/X86/X86InstrInfo.cpp
3547 ↗	(On Diff #39982)	Good point! Turn that into an assert maybe? That way, if it happens, we would know there are things that can be improved :). Up to you.
llvm/test/CodeGen/X86/fma-commute-x86.ll
12 ↗	(On Diff #39982)	I am guessing you put regular expression here because you want the test to be robust against scheduling changes. This is usually not the way we go, i.e., we tend to use the DAG construct for such cases. That being said, I do not remember whether DAG support NEXT. Anyway, for now, we may just match the actual scheduling, unless you saw it changing?

This revision is now accepted and ready to land.Nov 11 2015, 3:49 PM

Thank you for the comments and for the approval!
-Slava

llvm/lib/Target/X86/X86InstrInfo.cpp
3547 ↗	(On Diff #39982)	I'll replace it with an assert. Thank you.
llvm/test/CodeGen/X86/fma-commute-x86.ll
12 ↗	(On Diff #39982)	Scheduling may depend on harmless changes. I think I noticed some irregularities in how the parameters are copied to registers (i.e. which param is read first). So, this change was kind of preventive. Please see my next comment below. It looks like this approach happened to be useful already as it helps to avoid the need in fixing such tests after harmless changes in code-gen.
llvm/test/CodeGen/X86/fma-intrinsics-x86.ll
13 ↗	(On Diff #39982)	This test is a good example showing my motivations changing the checks. There was a misprint at the line 13: "rdi" was occasionally used instead of "rdx". The test passed with that misprint which means that movaps from rcx was going before movaps from rdx. After I tried running this test with a harmless patch (hopefully it will be submitted for code-review On Thursday Nov 12), this test failed because movaps from rdx for some unknown reasons started being met first. I also used "movap{{s\|d}}" because in some cases movaps is used to prepare the operand for SD/PD instructions, and in some cases movapd is used.

Closed by commit rL252973: X86-FMA3: Implemented commute transformations FMA*_Int instructions. (authored by v_klochkov). · Explain WhyNov 12 2015, 4:10 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86InstrFMA.td

12 lines

X86InstrInfo.cpp

324 lines

test/

CodeGen/

X86/

fma-commute-x86.ll

255 lines

fma-intrinsics-x86.ll

96 lines

Diff 40096

llvm/trunk/lib/Target/X86/X86InstrFMA.td

Show First 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	def m : FMA3<opc, MRMSrcMem, (outs RC:$dst),
(OpNode RC:$src2, RC:$src1, (load addr:$src3)))]>;		(OpNode RC:$src2, RC:$src1, (load addr:$src3)))]>;
}		}

// These FMA*_Int instructions are defined specially for being used when		// These FMA*_Int instructions are defined specially for being used when
// the scalar FMA intrinsics are lowered to machine instructions, and in that		// the scalar FMA intrinsics are lowered to machine instructions, and in that
// sence they are similar to existing ADD_Int, SUB_Int, MUL*_Int, etc.		// sence they are similar to existing ADD_Int, SUB_Int, MUL*_Int, etc.
// instructions.		// instructions.
//		//
// FIXME: The FMA*_Int instructions are TEMPORARILY defined as NOT commutable.		// All of the FMA*_Int opcodes are defined as commutable here.
// Commuting the 2nd and 3rd source register operands of FMAs is quite trivial		// Commuting the 2nd and 3rd source register operands of FMAs is quite trivial
// and the corresponding optimization has been developed (please see		// and the corresponding optimizations have been developed.
// http://reviews.llvm.org/D13269 for details). The optimization though needs
// some minor tuning to enable it for FMA*_Int opcodes.
// Commuting the 1st operand of FMA*_Int requires some additional analysis,		// Commuting the 1st operand of FMA*_Int requires some additional analysis,
// the commute optimization is legal only if all users of FMA*_Int use only		// the commute optimization is legal only if all users of FMA*_Int use only
// the lowest element of the FMA*_Int instruction.		// the lowest element of the FMA*_Int instruction. Even though such analysis
let Constraints = "$src1 = $dst", isCommutable = 0, isCodeGenOnly =1,		// may be not implemened yet we allow the routines doing the actual commute
		// transformation to decide if one or another instruction is commutable or not.
		let Constraints = "$src1 = $dst", isCommutable = 1, isCodeGenOnly = 1,
hasSideEffects = 0 in		hasSideEffects = 0 in
multiclass fma3s_rm_int<bits<8> opc, string OpcodeStr,		multiclass fma3s_rm_int<bits<8> opc, string OpcodeStr,
Operand memopr, RegisterClass RC> {		Operand memopr, RegisterClass RC> {
def r_Int : FMA3<opc, MRMSrcReg, (outs RC:$dst),		def r_Int : FMA3<opc, MRMSrcReg, (outs RC:$dst),
(ins RC:$src1, RC:$src2, RC:$src3),		(ins RC:$src1, RC:$src2, RC:$src3),
!strconcat(OpcodeStr,		!strconcat(OpcodeStr,
"\t{$src3, $src2, $dst\|$dst, $src2, $src3}"),		"\t{$src3, $src2, $dst\|$dst, $src2, $src3}"),
[]>;		[]>;
▲ Show 20 Lines • Show All 260 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,967 Lines • ▼ Show 20 Lines	X86InstrInfo::convertToThreeAddress(MachineFunction::iterator &MFI,
}		}

MFI->insert(MBBI, NewMI); // Insert the new inst		MFI->insert(MBBI, NewMI); // Insert the new inst
return NewMI;		return NewMI;
}		}

/// Returns true if the given instruction opcode is FMA3.		/// Returns true if the given instruction opcode is FMA3.
/// Otherwise, returns false.		/// Otherwise, returns false.
static bool isFMA3(unsigned Opcode) {		/// The second parameter is optional and is used as the second return from
		/// the function. It is set to true if the given instruction has FMA3 opcode
		/// that is used for lowering of scalar FMA intrinsics, and it is set to false
		/// otherwise.
		static bool isFMA3(unsigned Opcode, bool *IsIntrinsic = nullptr) {
		if (IsIntrinsic)
		*IsIntrinsic = false;

switch (Opcode) {		switch (Opcode) {
case X86::VFMADDSDr132r: case X86::VFMADDSDr132m:		case X86::VFMADDSDr132r: case X86::VFMADDSDr132m:
case X86::VFMADDSSr132r: case X86::VFMADDSSr132m:		case X86::VFMADDSSr132r: case X86::VFMADDSSr132m:
case X86::VFMSUBSDr132r: case X86::VFMSUBSDr132m:		case X86::VFMSUBSDr132r: case X86::VFMSUBSDr132m:
case X86::VFMSUBSSr132r: case X86::VFMSUBSSr132m:		case X86::VFMSUBSSr132r: case X86::VFMSUBSSr132m:
case X86::VFNMADDSDr132r: case X86::VFNMADDSDr132m:		case X86::VFNMADDSDr132r: case X86::VFNMADDSDr132m:
case X86::VFNMADDSSr132r: case X86::VFNMADDSSr132m:		case X86::VFNMADDSSr132r: case X86::VFNMADDSSr132m:
case X86::VFNMSUBSDr132r: case X86::VFNMSUBSDr132m:		case X86::VFNMSUBSDr132r: case X86::VFNMSUBSDr132m:
case X86::VFNMSUBSSr132r: case X86::VFNMSUBSSr132m:		case X86::VFNMSUBSSr132r: case X86::VFNMSUBSSr132m:

case X86::VFMADDSDr213r: case X86::VFMADDSDr213m:		case X86::VFMADDSDr213r: case X86::VFMADDSDr213m:
case X86::VFMADDSSr213r: case X86::VFMADDSSr213m:		case X86::VFMADDSSr213r: case X86::VFMADDSSr213m:
case X86::VFMSUBSDr213r: case X86::VFMSUBSDr213m:		case X86::VFMSUBSDr213r: case X86::VFMSUBSDr213m:
case X86::VFMSUBSSr213r: case X86::VFMSUBSSr213m:		case X86::VFMSUBSSr213r: case X86::VFMSUBSSr213m:
case X86::VFNMADDSDr213r: case X86::VFNMADDSDr213m:		case X86::VFNMADDSDr213r: case X86::VFNMADDSDr213m:
case X86::VFNMADDSSr213r: case X86::VFNMADDSSr213m:		case X86::VFNMADDSSr213r: case X86::VFNMADDSSr213m:
case X86::VFNMSUBSDr213r: case X86::VFNMSUBSDr213m:		case X86::VFNMSUBSDr213r: case X86::VFNMSUBSDr213m:
case X86::VFNMSUBSSr213r: case X86::VFNMSUBSSr213m:		case X86::VFNMSUBSSr213r: case X86::VFNMSUBSSr213m:

case X86::VFMADDSDr231r: case X86::VFMADDSDr231m:		case X86::VFMADDSDr231r: case X86::VFMADDSDr231m:
case X86::VFMADDSSr231r: case X86::VFMADDSSr231m:		case X86::VFMADDSSr231r: case X86::VFMADDSSr231m:
case X86::VFMSUBSDr231r: case X86::VFMSUBSDr231m:		case X86::VFMSUBSDr231r: case X86::VFMSUBSDr231m:
case X86::VFMSUBSSr231r: case X86::VFMSUBSSr231m:		case X86::VFMSUBSSr231r: case X86::VFMSUBSSr231m:
case X86::VFNMADDSDr231r: case X86::VFNMADDSDr231m:		case X86::VFNMADDSDr231r: case X86::VFNMADDSDr231m:
case X86::VFNMADDSSr231r: case X86::VFNMADDSSr231m:		case X86::VFNMADDSSr231r: case X86::VFNMADDSSr231m:
case X86::VFNMSUBSDr231r: case X86::VFNMSUBSDr231m:		case X86::VFNMSUBSDr231r: case X86::VFNMSUBSDr231m:
case X86::VFNMSUBSSr231r: case X86::VFNMSUBSSr231m:		case X86::VFNMSUBSSr231r: case X86::VFNMSUBSSr231m:

case X86::VFMADDSUBPDr132r: case X86::VFMADDSUBPDr132m:		case X86::VFMADDSUBPDr132r: case X86::VFMADDSUBPDr132m:
case X86::VFMADDSUBPSr132r: case X86::VFMADDSUBPSr132m:		case X86::VFMADDSUBPSr132r: case X86::VFMADDSUBPSr132m:
case X86::VFMSUBADDPDr132r: case X86::VFMSUBADDPDr132m:		case X86::VFMSUBADDPDr132r: case X86::VFMSUBADDPDr132m:
case X86::VFMSUBADDPSr132r: case X86::VFMSUBADDPSr132m:		case X86::VFMSUBADDPSr132r: case X86::VFMSUBADDPSr132m:
case X86::VFMADDSUBPDr132rY: case X86::VFMADDSUBPDr132mY:		case X86::VFMADDSUBPDr132rY: case X86::VFMADDSUBPDr132mY:
case X86::VFMADDSUBPSr132rY: case X86::VFMADDSUBPSr132mY:		case X86::VFMADDSUBPSr132rY: case X86::VFMADDSUBPSr132mY:
case X86::VFMSUBADDPDr132rY: case X86::VFMSUBADDPDr132mY:		case X86::VFMSUBADDPDr132rY: case X86::VFMSUBADDPDr132mY:
case X86::VFMSUBADDPSr132rY: case X86::VFMSUBADDPSr132mY:		case X86::VFMSUBADDPSr132rY: case X86::VFMSUBADDPSr132mY:

case X86::VFMADDPDr132r: case X86::VFMADDPDr132m:		case X86::VFMADDPDr132r: case X86::VFMADDPDr132m:
case X86::VFMADDPSr132r: case X86::VFMADDPSr132m:		case X86::VFMADDPSr132r: case X86::VFMADDPSr132m:
case X86::VFMSUBPDr132r: case X86::VFMSUBPDr132m:		case X86::VFMSUBPDr132r: case X86::VFMSUBPDr132m:
case X86::VFMSUBPSr132r: case X86::VFMSUBPSr132m:		case X86::VFMSUBPSr132r: case X86::VFMSUBPSr132m:
case X86::VFNMADDPDr132r: case X86::VFNMADDPDr132m:		case X86::VFNMADDPDr132r: case X86::VFNMADDPDr132m:
case X86::VFNMADDPSr132r: case X86::VFNMADDPSr132m:		case X86::VFNMADDPSr132r: case X86::VFNMADDPSr132m:
case X86::VFNMSUBPDr132r: case X86::VFNMSUBPDr132m:		case X86::VFNMSUBPDr132r: case X86::VFNMSUBPDr132m:
case X86::VFNMSUBPSr132r: case X86::VFNMSUBPSr132m:		case X86::VFNMSUBPSr132r: case X86::VFNMSUBPSr132m:
case X86::VFMADDPDr132rY: case X86::VFMADDPDr132mY:		case X86::VFMADDPDr132rY: case X86::VFMADDPDr132mY:
case X86::VFMADDPSr132rY: case X86::VFMADDPSr132mY:		case X86::VFMADDPSr132rY: case X86::VFMADDPSr132mY:
case X86::VFMSUBPDr132rY: case X86::VFMSUBPDr132mY:		case X86::VFMSUBPDr132rY: case X86::VFMSUBPDr132mY:
case X86::VFMSUBPSr132rY: case X86::VFMSUBPSr132mY:		case X86::VFMSUBPSr132rY: case X86::VFMSUBPSr132mY:
case X86::VFNMADDPDr132rY: case X86::VFNMADDPDr132mY:		case X86::VFNMADDPDr132rY: case X86::VFNMADDPDr132mY:
case X86::VFNMADDPSr132rY: case X86::VFNMADDPSr132mY:		case X86::VFNMADDPSr132rY: case X86::VFNMADDPSr132mY:
case X86::VFNMSUBPDr132rY: case X86::VFNMSUBPDr132mY:		case X86::VFNMSUBPDr132rY: case X86::VFNMSUBPDr132mY:
case X86::VFNMSUBPSr132rY: case X86::VFNMSUBPSr132mY:		case X86::VFNMSUBPSr132rY: case X86::VFNMSUBPSr132mY:

case X86::VFMADDSUBPDr213r: case X86::VFMADDSUBPDr213m:		case X86::VFMADDSUBPDr213r: case X86::VFMADDSUBPDr213m:
case X86::VFMADDSUBPSr213r: case X86::VFMADDSUBPSr213m:		case X86::VFMADDSUBPSr213r: case X86::VFMADDSUBPSr213m:
case X86::VFMSUBADDPDr213r: case X86::VFMSUBADDPDr213m:		case X86::VFMSUBADDPDr213r: case X86::VFMSUBADDPDr213m:
case X86::VFMSUBADDPSr213r: case X86::VFMSUBADDPSr213m:		case X86::VFMSUBADDPSr213r: case X86::VFMSUBADDPSr213m:
case X86::VFMADDSUBPDr213rY: case X86::VFMADDSUBPDr213mY:		case X86::VFMADDSUBPDr213rY: case X86::VFMADDSUBPDr213mY:
case X86::VFMADDSUBPSr213rY: case X86::VFMADDSUBPSr213mY:		case X86::VFMADDSUBPSr213rY: case X86::VFMADDSUBPSr213mY:
case X86::VFMSUBADDPDr213rY: case X86::VFMSUBADDPDr213mY:		case X86::VFMSUBADDPDr213rY: case X86::VFMSUBADDPDr213mY:
case X86::VFMSUBADDPSr213rY: case X86::VFMSUBADDPSr213mY:		case X86::VFMSUBADDPSr213rY: case X86::VFMSUBADDPSr213mY:

case X86::VFMADDPDr213r: case X86::VFMADDPDr213m:		case X86::VFMADDPDr213r: case X86::VFMADDPDr213m:
case X86::VFMADDPSr213r: case X86::VFMADDPSr213m:		case X86::VFMADDPSr213r: case X86::VFMADDPSr213m:
case X86::VFMSUBPDr213r: case X86::VFMSUBPDr213m:		case X86::VFMSUBPDr213r: case X86::VFMSUBPDr213m:
case X86::VFMSUBPSr213r: case X86::VFMSUBPSr213m:		case X86::VFMSUBPSr213r: case X86::VFMSUBPSr213m:
case X86::VFNMADDPDr213r: case X86::VFNMADDPDr213m:		case X86::VFNMADDPDr213r: case X86::VFNMADDPDr213m:
case X86::VFNMADDPSr213r: case X86::VFNMADDPSr213m:		case X86::VFNMADDPSr213r: case X86::VFNMADDPSr213m:
case X86::VFNMSUBPDr213r: case X86::VFNMSUBPDr213m:		case X86::VFNMSUBPDr213r: case X86::VFNMSUBPDr213m:
case X86::VFNMSUBPSr213r: case X86::VFNMSUBPSr213m:		case X86::VFNMSUBPSr213r: case X86::VFNMSUBPSr213m:
case X86::VFMADDPDr213rY: case X86::VFMADDPDr213mY:		case X86::VFMADDPDr213rY: case X86::VFMADDPDr213mY:
case X86::VFMADDPSr213rY: case X86::VFMADDPSr213mY:		case X86::VFMADDPSr213rY: case X86::VFMADDPSr213mY:
case X86::VFMSUBPDr213rY: case X86::VFMSUBPDr213mY:		case X86::VFMSUBPDr213rY: case X86::VFMSUBPDr213mY:
case X86::VFMSUBPSr213rY: case X86::VFMSUBPSr213mY:		case X86::VFMSUBPSr213rY: case X86::VFMSUBPSr213mY:
case X86::VFNMADDPDr213rY: case X86::VFNMADDPDr213mY:		case X86::VFNMADDPDr213rY: case X86::VFNMADDPDr213mY:
case X86::VFNMADDPSr213rY: case X86::VFNMADDPSr213mY:		case X86::VFNMADDPSr213rY: case X86::VFNMADDPSr213mY:
case X86::VFNMSUBPDr213rY: case X86::VFNMSUBPDr213mY:		case X86::VFNMSUBPDr213rY: case X86::VFNMSUBPDr213mY:
case X86::VFNMSUBPSr213rY: case X86::VFNMSUBPSr213mY:		case X86::VFNMSUBPSr213rY: case X86::VFNMSUBPSr213mY:

case X86::VFMADDSUBPDr231r: case X86::VFMADDSUBPDr231m:		case X86::VFMADDSUBPDr231r: case X86::VFMADDSUBPDr231m:
case X86::VFMADDSUBPSr231r: case X86::VFMADDSUBPSr231m:		case X86::VFMADDSUBPSr231r: case X86::VFMADDSUBPSr231m:
case X86::VFMSUBADDPDr231r: case X86::VFMSUBADDPDr231m:		case X86::VFMSUBADDPDr231r: case X86::VFMSUBADDPDr231m:
case X86::VFMSUBADDPSr231r: case X86::VFMSUBADDPSr231m:		case X86::VFMSUBADDPSr231r: case X86::VFMSUBADDPSr231m:
case X86::VFMADDSUBPDr231rY: case X86::VFMADDSUBPDr231mY:		case X86::VFMADDSUBPDr231rY: case X86::VFMADDSUBPDr231mY:
case X86::VFMADDSUBPSr231rY: case X86::VFMADDSUBPSr231mY:		case X86::VFMADDSUBPSr231rY: case X86::VFMADDSUBPSr231mY:
case X86::VFMSUBADDPDr231rY: case X86::VFMSUBADDPDr231mY:		case X86::VFMSUBADDPDr231rY: case X86::VFMSUBADDPDr231mY:
case X86::VFMSUBADDPSr231rY: case X86::VFMSUBADDPSr231mY:		case X86::VFMSUBADDPSr231rY: case X86::VFMSUBADDPSr231mY:

case X86::VFMADDPDr231r: case X86::VFMADDPDr231m:		case X86::VFMADDPDr231r: case X86::VFMADDPDr231m:
case X86::VFMADDPSr231r: case X86::VFMADDPSr231m:		case X86::VFMADDPSr231r: case X86::VFMADDPSr231m:
case X86::VFMSUBPDr231r: case X86::VFMSUBPDr231m:		case X86::VFMSUBPDr231r: case X86::VFMSUBPDr231m:
case X86::VFMSUBPSr231r: case X86::VFMSUBPSr231m:		case X86::VFMSUBPSr231r: case X86::VFMSUBPSr231m:
case X86::VFNMADDPDr231r: case X86::VFNMADDPDr231m:		case X86::VFNMADDPDr231r: case X86::VFNMADDPDr231m:
case X86::VFNMADDPSr231r: case X86::VFNMADDPSr231m:		case X86::VFNMADDPSr231r: case X86::VFNMADDPSr231m:
case X86::VFNMSUBPDr231r: case X86::VFNMSUBPDr231m:		case X86::VFNMSUBPDr231r: case X86::VFNMSUBPDr231m:
case X86::VFNMSUBPSr231r: case X86::VFNMSUBPSr231m:		case X86::VFNMSUBPSr231r: case X86::VFNMSUBPSr231m:
case X86::VFMADDPDr231rY: case X86::VFMADDPDr231mY:		case X86::VFMADDPDr231rY: case X86::VFMADDPDr231mY:
case X86::VFMADDPSr231rY: case X86::VFMADDPSr231mY:		case X86::VFMADDPSr231rY: case X86::VFMADDPSr231mY:
case X86::VFMSUBPDr231rY: case X86::VFMSUBPDr231mY:		case X86::VFMSUBPDr231rY: case X86::VFMSUBPDr231mY:
case X86::VFMSUBPSr231rY: case X86::VFMSUBPSr231mY:		case X86::VFMSUBPSr231rY: case X86::VFMSUBPSr231mY:
case X86::VFNMADDPDr231rY: case X86::VFNMADDPDr231mY:		case X86::VFNMADDPDr231rY: case X86::VFNMADDPDr231mY:
case X86::VFNMADDPSr231rY: case X86::VFNMADDPSr231mY:		case X86::VFNMADDPSr231rY: case X86::VFNMADDPSr231mY:
case X86::VFNMSUBPDr231rY: case X86::VFNMSUBPDr231mY:		case X86::VFNMSUBPDr231rY: case X86::VFNMSUBPDr231mY:
case X86::VFNMSUBPSr231rY: case X86::VFNMSUBPSr231mY:		case X86::VFNMSUBPSr231rY: case X86::VFNMSUBPSr231mY:
return true;		return true;

		case X86::VFMADDSDr132r_Int: case X86::VFMADDSDr132m_Int:
		case X86::VFMADDSSr132r_Int: case X86::VFMADDSSr132m_Int:
		case X86::VFMSUBSDr132r_Int: case X86::VFMSUBSDr132m_Int:
		case X86::VFMSUBSSr132r_Int: case X86::VFMSUBSSr132m_Int:
		case X86::VFNMADDSDr132r_Int: case X86::VFNMADDSDr132m_Int:
		case X86::VFNMADDSSr132r_Int: case X86::VFNMADDSSr132m_Int:
		case X86::VFNMSUBSDr132r_Int: case X86::VFNMSUBSDr132m_Int:
		case X86::VFNMSUBSSr132r_Int: case X86::VFNMSUBSSr132m_Int:

		case X86::VFMADDSDr213r_Int: case X86::VFMADDSDr213m_Int:
		case X86::VFMADDSSr213r_Int: case X86::VFMADDSSr213m_Int:
		case X86::VFMSUBSDr213r_Int: case X86::VFMSUBSDr213m_Int:
		case X86::VFMSUBSSr213r_Int: case X86::VFMSUBSSr213m_Int:
		case X86::VFNMADDSDr213r_Int: case X86::VFNMADDSDr213m_Int:
		case X86::VFNMADDSSr213r_Int: case X86::VFNMADDSSr213m_Int:
		case X86::VFNMSUBSDr213r_Int: case X86::VFNMSUBSDr213m_Int:
		case X86::VFNMSUBSSr213r_Int: case X86::VFNMSUBSSr213m_Int:

		case X86::VFMADDSDr231r_Int: case X86::VFMADDSDr231m_Int:
		case X86::VFMADDSSr231r_Int: case X86::VFMADDSSr231m_Int:
		case X86::VFMSUBSDr231r_Int: case X86::VFMSUBSDr231m_Int:
		case X86::VFMSUBSSr231r_Int: case X86::VFMSUBSSr231m_Int:
		case X86::VFNMADDSDr231r_Int: case X86::VFNMADDSDr231m_Int:
		case X86::VFNMADDSSr231r_Int: case X86::VFNMADDSSr231m_Int:
		case X86::VFNMSUBSDr231r_Int: case X86::VFNMSUBSDr231m_Int:
		case X86::VFNMSUBSSr231r_Int: case X86::VFNMSUBSSr231m_Int:
		if (IsIntrinsic)
		*IsIntrinsic = true;
		return true;
default:		default:
return false;		return false;
}		}
llvm_unreachable("Opcode not handled by the switch");		llvm_unreachable("Opcode not handled by the switch");
}		}

MachineInstr X86InstrInfo::commuteInstructionImpl(MachineInstr MI,		MachineInstr X86InstrInfo::commuteInstructionImpl(MachineInstr MI,
bool NewMI,		bool NewMI,
▲ Show 20 Lines • Show All 282 Lines • ▼ Show 20 Lines

unsigned X86InstrInfo::getFMA3OpcodeToCommuteOperands(MachineInstr *MI,		unsigned X86InstrInfo::getFMA3OpcodeToCommuteOperands(MachineInstr *MI,
unsigned SrcOpIdx1,		unsigned SrcOpIdx1,
unsigned SrcOpIdx2) const {		unsigned SrcOpIdx2) const {
unsigned Opc = MI->getOpcode();		unsigned Opc = MI->getOpcode();

// Define the array that holds FMA opcodes in groups		// Define the array that holds FMA opcodes in groups
// of 3 opcodes(132, 213, 231) in each group.		// of 3 opcodes(132, 213, 231) in each group.
static const unsigned OpcodeGroups[][3] = {		static const unsigned RegularOpcodeGroups[][3] = {
{ X86::VFMADDSSr132r, X86::VFMADDSSr213r, X86::VFMADDSSr231r },		{ X86::VFMADDSSr132r, X86::VFMADDSSr213r, X86::VFMADDSSr231r },
{ X86::VFMADDSDr132r, X86::VFMADDSDr213r, X86::VFMADDSDr231r },		{ X86::VFMADDSDr132r, X86::VFMADDSDr213r, X86::VFMADDSDr231r },
{ X86::VFMADDPSr132r, X86::VFMADDPSr213r, X86::VFMADDPSr231r },		{ X86::VFMADDPSr132r, X86::VFMADDPSr213r, X86::VFMADDPSr231r },
{ X86::VFMADDPDr132r, X86::VFMADDPDr213r, X86::VFMADDPDr231r },		{ X86::VFMADDPDr132r, X86::VFMADDPDr213r, X86::VFMADDPDr231r },
{ X86::VFMADDPSr132rY, X86::VFMADDPSr213rY, X86::VFMADDPSr231rY },		{ X86::VFMADDPSr132rY, X86::VFMADDPSr213rY, X86::VFMADDPSr231rY },
{ X86::VFMADDPDr132rY, X86::VFMADDPDr213rY, X86::VFMADDPDr231rY },		{ X86::VFMADDPDr132rY, X86::VFMADDPDr213rY, X86::VFMADDPDr231rY },
{ X86::VFMADDSSr132m, X86::VFMADDSSr213m, X86::VFMADDSSr231m },		{ X86::VFMADDSSr132m, X86::VFMADDSSr213m, X86::VFMADDSSr231m },
{ X86::VFMADDSDr132m, X86::VFMADDSDr213m, X86::VFMADDSDr231m },		{ X86::VFMADDSDr132m, X86::VFMADDSDr213m, X86::VFMADDSDr231m },
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	static const unsigned RegularOpcodeGroups[][3] = {
{ X86::VFMSUBADDPDr132r, X86::VFMSUBADDPDr213r, X86::VFMSUBADDPDr231r },		{ X86::VFMSUBADDPDr132r, X86::VFMSUBADDPDr213r, X86::VFMSUBADDPDr231r },
{ X86::VFMSUBADDPSr132rY, X86::VFMSUBADDPSr213rY, X86::VFMSUBADDPSr231rY },		{ X86::VFMSUBADDPSr132rY, X86::VFMSUBADDPSr213rY, X86::VFMSUBADDPSr231rY },
{ X86::VFMSUBADDPDr132rY, X86::VFMSUBADDPDr213rY, X86::VFMSUBADDPDr231rY },		{ X86::VFMSUBADDPDr132rY, X86::VFMSUBADDPDr213rY, X86::VFMSUBADDPDr231rY },
{ X86::VFMSUBADDPSr132m, X86::VFMSUBADDPSr213m, X86::VFMSUBADDPSr231m },		{ X86::VFMSUBADDPSr132m, X86::VFMSUBADDPSr213m, X86::VFMSUBADDPSr231m },
{ X86::VFMSUBADDPDr132m, X86::VFMSUBADDPDr213m, X86::VFMSUBADDPDr231m },		{ X86::VFMSUBADDPDr132m, X86::VFMSUBADDPDr213m, X86::VFMSUBADDPDr231m },
{ X86::VFMSUBADDPSr132mY, X86::VFMSUBADDPSr213mY, X86::VFMSUBADDPSr231mY },		{ X86::VFMSUBADDPSr132mY, X86::VFMSUBADDPSr213mY, X86::VFMSUBADDPSr231mY },
{ X86::VFMSUBADDPDr132mY, X86::VFMSUBADDPDr213mY, X86::VFMSUBADDPDr231mY }		{ X86::VFMSUBADDPDr132mY, X86::VFMSUBADDPDr213mY, X86::VFMSUBADDPDr231mY }
};		};

		// Define the array that holds FMA*_Int opcodes in groups
		// of 3 opcodes(132, 213, 231) in each group.
		static const unsigned IntrinOpcodeGroups[][3] = {
		{ X86::VFMADDSSr132r_Int, X86::VFMADDSSr213r_Int, X86::VFMADDSSr231r_Int },
		{ X86::VFMADDSDr132r_Int, X86::VFMADDSDr213r_Int, X86::VFMADDSDr231r_Int },
		{ X86::VFMADDSSr132m_Int, X86::VFMADDSSr213m_Int, X86::VFMADDSSr231m_Int },
		{ X86::VFMADDSDr132m_Int, X86::VFMADDSDr213m_Int, X86::VFMADDSDr231m_Int },

		{ X86::VFMSUBSSr132r_Int, X86::VFMSUBSSr213r_Int, X86::VFMSUBSSr231r_Int },
		{ X86::VFMSUBSDr132r_Int, X86::VFMSUBSDr213r_Int, X86::VFMSUBSDr231r_Int },
		{ X86::VFMSUBSSr132m_Int, X86::VFMSUBSSr213m_Int, X86::VFMSUBSSr231m_Int },
		{ X86::VFMSUBSDr132m_Int, X86::VFMSUBSDr213m_Int, X86::VFMSUBSDr231m_Int },

		{ X86::VFNMADDSSr132r_Int, X86::VFNMADDSSr213r_Int, X86::VFNMADDSSr231r_Int },
		{ X86::VFNMADDSDr132r_Int, X86::VFNMADDSDr213r_Int, X86::VFNMADDSDr231r_Int },
		{ X86::VFNMADDSSr132m_Int, X86::VFNMADDSSr213m_Int, X86::VFNMADDSSr231m_Int },
		{ X86::VFNMADDSDr132m_Int, X86::VFNMADDSDr213m_Int, X86::VFNMADDSDr231m_Int },

		{ X86::VFNMSUBSSr132r_Int, X86::VFNMSUBSSr213r_Int, X86::VFNMSUBSSr231r_Int },
		{ X86::VFNMSUBSDr132r_Int, X86::VFNMSUBSDr213r_Int, X86::VFNMSUBSDr231r_Int },
		{ X86::VFNMSUBSSr132m_Int, X86::VFNMSUBSSr213m_Int, X86::VFNMSUBSSr231m_Int },
		{ X86::VFNMSUBSDr132m_Int, X86::VFNMSUBSDr213m_Int, X86::VFNMSUBSDr231m_Int },
		};

const unsigned Form132Index = 0;		const unsigned Form132Index = 0;
const unsigned Form213Index = 1;		const unsigned Form213Index = 1;
const unsigned Form231Index = 2;		const unsigned Form231Index = 2;
const unsigned FormsNum = 3;		const unsigned FormsNum = 3;

// Look for the input opcode in the OpcodeGroups table.		bool IsIntrinOpcode;
unsigned OpcodeGroupsNum = sizeof(OpcodeGroups) / sizeof(OpcodeGroups[0]);		isFMA3(Opc, &IsIntrinOpcode);
unsigned GroupIndex = 0, FormIndex = FormsNum;
for (; GroupIndex < OpcodeGroupsNum && FormIndex == FormsNum; GroupIndex++) {		unsigned GroupsNum;
		const unsigned (*OpcodeGroups)[3];
		if (IsIntrinOpcode) {
		GroupsNum = sizeof(IntrinOpcodeGroups) / sizeof(IntrinOpcodeGroups[0]);
		OpcodeGroups = IntrinOpcodeGroups;
		} else {
		GroupsNum = sizeof(RegularOpcodeGroups) / sizeof(RegularOpcodeGroups[0]);
		OpcodeGroups = RegularOpcodeGroups;
		}

		const unsigned *FoundOpcodesGroup = nullptr;
		unsigned FormIndex;

		// Look for the input opcode in the corresponding opcodes table.
		unsigned GroupIndex = 0;
		for (; GroupIndex < GroupsNum && !FoundOpcodesGroup; GroupIndex++) {
for (FormIndex = 0; FormIndex < FormsNum; FormIndex++) {		for (FormIndex = 0; FormIndex < FormsNum; FormIndex++) {
if (OpcodeGroups[GroupIndex][FormIndex] == Opc)		if (OpcodeGroups[GroupIndex][FormIndex] == Opc) {
		FoundOpcodesGroup = OpcodeGroups[GroupIndex];
break;		break;
}		}
}		}
// Input opcode does not match with any of the opcodes from the table.		}
if (FormIndex == FormsNum)
return 0;		// The input opcode does not match with any of the opcodes from the tables.
// Do not forget to fix the GroupIndex after the loop.		// The unsupported FMA opcode must be added to one of the two opcode groups
GroupIndex--;		// defined above.
		assert(FoundOpcodesGroup != nullptr && "Unexpected FMA3 opcode");

// Put the lowest index to SrcOpIdx1 to simplify the checks below.		// Put the lowest index to SrcOpIdx1 to simplify the checks below.
if (SrcOpIdx1 > SrcOpIdx2)		if (SrcOpIdx1 > SrcOpIdx2)
std::swap(SrcOpIdx1, SrcOpIdx2);		std::swap(SrcOpIdx1, SrcOpIdx2);

		// TODO: Commuting the 1st operand of FMA*_Int requires some additional
		// analysis. The commute optimization is legal only if all users of FMA*_Int
		// use only the lowest element of the FMA*_Int instruction. Such analysis are
		// not implemented yet. So, just return 0 in that case.
		// When such analysis are available this place will be the right place for
		// calling it.
		if (IsIntrinOpcode && SrcOpIdx1 == 1)
		return 0;

unsigned Case;		unsigned Case;
if (SrcOpIdx1 == 1 && SrcOpIdx2 == 2)		if (SrcOpIdx1 == 1 && SrcOpIdx2 == 2)
Case = 0;		Case = 0;
else if (SrcOpIdx1 == 1 && SrcOpIdx2 == 3)		else if (SrcOpIdx1 == 1 && SrcOpIdx2 == 3)
Case = 1;		Case = 1;
else if (SrcOpIdx1 == 2 && SrcOpIdx2 == 3)		else if (SrcOpIdx1 == 2 && SrcOpIdx2 == 3)
Case = 2;		Case = 2;
else		else
return 0;		return 0;

Show All 15 Lines	static const unsigned FormMapping[][3] = {
// FMA132 a, C, B; ==> FMA213 a, B, C;		// FMA132 a, C, B; ==> FMA213 a, B, C;
// FMA213 b, A, C; ==> FMA132 b, C, A;		// FMA213 b, A, C; ==> FMA132 b, C, A;
// FMA231 c, A, B; ==> FMA231 c, B, A;		// FMA231 c, A, B; ==> FMA231 c, B, A;
{ Form213Index, Form132Index, Form231Index }		{ Form213Index, Form132Index, Form231Index }
};		};

// Everything is ready, just adjust the FMA opcode and return it.		// Everything is ready, just adjust the FMA opcode and return it.
FormIndex = FormMapping[Case][FormIndex];		FormIndex = FormMapping[Case][FormIndex];
return OpcodeGroups[GroupIndex][FormIndex];		return FoundOpcodesGroup[FormIndex];
}		}

bool X86InstrInfo::findCommutedOpIndices(MachineInstr *MI,		bool X86InstrInfo::findCommutedOpIndices(MachineInstr *MI,
unsigned &SrcOpIdx1,		unsigned &SrcOpIdx1,
unsigned &SrcOpIdx2) const {		unsigned &SrcOpIdx2) const {
switch (MI->getOpcode()) {		switch (MI->getOpcode()) {
case X86::CMPPDrri:		case X86::CMPPDrri:
case X86::CMPPSrri:		case X86::CMPPSrri:
▲ Show 20 Lines • Show All 3,588 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/fma-commute-x86.ll

	; RUN: llc < %s -mtriple=x86_64-pc-win32 -mcpu=core-avx2 \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-pc-win32 -mcpu=core-avx2 \| FileCheck %s
	; RUN: llc < %s -mtriple=x86_64-pc-win32 -mattr=+fma,+fma4 \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-pc-win32 -mattr=+fma,+fma4 \| FileCheck %s
	; RUN: llc < %s -mcpu=bdver2 -mtriple=x86_64-pc-win32 -mattr=-fma4 \| FileCheck %s			; RUN: llc < %s -mcpu=bdver2 -mtriple=x86_64-pc-win32 -mattr=-fma4 \| FileCheck %s

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

				declare <4 x float> @llvm.x86.fma.vfmadd.ss(<4 x float>, <4 x float>, <4 x float>) nounwind readnone
				define <4 x float> @test_x86_fmadd_baa_ss(<4 x float> %a, <4 x float> %b) #0 {
				; CHECK-LABEL: test_x86_fmadd_baa_ss:
				; CHECK: # BB#0:
				; CHECK-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%rcx$, %xmm1}}
				; CHECK-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%rcx$, %xmm1}}
				; CHECK-NEXT: vfmadd213ss %xmm1, %xmm1, %xmm0
				; CHECK-NEXT: retq
				%res = call <4 x float> @llvm.x86.fma.vfmadd.ss(<4 x float> %b, <4 x float> %a, <4 x float> %a) nounwind
				ret <4 x float> %res
				}

				define <4 x float> @test_x86_fmadd_aba_ss(<4 x float> %a, <4 x float> %b) #0 {
				; CHECK-LABEL: test_x86_fmadd_aba_ss:
				; CHECK: # BB#0:
				; CHECK-NEXT: vmovaps (%rcx), %xmm0
				; CHECK-NEXT: vfmadd132ss (%rdx), %xmm0, %xmm0
				; CHECK-NEXT: retq
				%res = call <4 x float> @llvm.x86.fma.vfmadd.ss(<4 x float> %a, <4 x float> %b, <4 x float> %a) nounwind
				ret <4 x float> %res
				}

				define <4 x float> @test_x86_fmadd_bba_ss(<4 x float> %a, <4 x float> %b) #0 {
				; CHECK-LABEL: test_x86_fmadd_bba_ss:
				; CHECK: # BB#0:
				; CHECK-NEXT: vmovaps (%rdx), %xmm0
				; CHECK-NEXT: vfmadd213ss (%rcx), %xmm0, %xmm0
				; CHECK-NEXT: retq
				%res = call <4 x float> @llvm.x86.fma.vfmadd.ss(<4 x float> %b, <4 x float> %b, <4 x float> %a) nounwind
				ret <4 x float> %res
				}

	declare <4 x float> @llvm.x86.fma.vfmadd.ps(<4 x float>, <4 x float>, <4 x float>) nounwind readnone			declare <4 x float> @llvm.x86.fma.vfmadd.ps(<4 x float>, <4 x float>, <4 x float>) nounwind readnone
	define <4 x float> @test_x86_fmadd_baa_ps(<4 x float> %a, <4 x float> %b) #0 {			define <4 x float> @test_x86_fmadd_baa_ps(<4 x float> %a, <4 x float> %b) #0 {
	; CHECK-LABEL: test_x86_fmadd_baa_ps:			; CHECK-LABEL: test_x86_fmadd_baa_ps:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: vmovaps (%rcx), %xmm0			; CHECK-NEXT: vmovaps (%rcx), %xmm0
	; CHECK-NEXT: vfmadd132ps (%rdx), %xmm0, %xmm0			; CHECK-NEXT: vfmadd132ps (%rdx), %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <4 x float> @llvm.x86.fma.vfmadd.ps(<4 x float> %b, <4 x float> %a, <4 x float> %a) nounwind			%res = call <4 x float> @llvm.x86.fma.vfmadd.ps(<4 x float> %b, <4 x float> %a, <4 x float> %a) nounwind
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: vmovaps (%rdx), %ymm0			; CHECK-NEXT: vmovaps (%rdx), %ymm0
	; CHECK-NEXT: vfmadd213ps (%rcx), %ymm0, %ymm0			; CHECK-NEXT: vfmadd213ps (%rcx), %ymm0, %ymm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <8 x float> @llvm.x86.fma.vfmadd.ps.256(<8 x float> %b, <8 x float> %b, <8 x float> %a) nounwind			%res = call <8 x float> @llvm.x86.fma.vfmadd.ps.256(<8 x float> %b, <8 x float> %b, <8 x float> %a) nounwind
	ret <8 x float> %res			ret <8 x float> %res
	}			}

				declare <2 x double> @llvm.x86.fma.vfmadd.sd(<2 x double>, <2 x double>, <2 x double>) nounwind readnone
				define <2 x double> @test_x86_fmadd_baa_sd(<2 x double> %a, <2 x double> %b) #0 {
				; CHECK-LABEL: test_x86_fmadd_baa_sd:
				; CHECK: # BB#0:
				; CHECK-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%rcx$, %xmm1}}
				; CHECK-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%rcx$, %xmm1}}
				; CHECK-NEXT: vfmadd213sd %xmm1, %xmm1, %xmm0
				; CHECK-NEXT: retq
				%res = call <2 x double> @llvm.x86.fma.vfmadd.sd(<2 x double> %b, <2 x double> %a, <2 x double> %a) nounwind
				ret <2 x double> %res
				}

				define <2 x double> @test_x86_fmadd_aba_sd(<2 x double> %a, <2 x double> %b) #0 {
				; CHECK-LABEL: test_x86_fmadd_aba_sd:
				; CHECK: # BB#0:
				; CHECK-NEXT: vmovaps (%rcx), %xmm0
				; CHECK-NEXT: vfmadd132sd (%rdx), %xmm0, %xmm0
				; CHECK-NEXT: retq
				%res = call <2 x double> @llvm.x86.fma.vfmadd.sd(<2 x double> %a, <2 x double> %b, <2 x double> %a) nounwind
				ret <2 x double> %res
				}

				define <2 x double> @test_x86_fmadd_bba_sd(<2 x double> %a, <2 x double> %b) #0 {
				; CHECK-LABEL: test_x86_fmadd_bba_sd:
				; CHECK: # BB#0:
				; CHECK-NEXT: vmovaps (%rdx), %xmm0
				; CHECK-NEXT: vfmadd213sd (%rcx), %xmm0, %xmm0
				; CHECK-NEXT: retq
				%res = call <2 x double> @llvm.x86.fma.vfmadd.sd(<2 x double> %b, <2 x double> %b, <2 x double> %a) nounwind
				ret <2 x double> %res
				}

	declare <2 x double> @llvm.x86.fma.vfmadd.pd(<2 x double>, <2 x double>, <2 x double>) nounwind readnone			declare <2 x double> @llvm.x86.fma.vfmadd.pd(<2 x double>, <2 x double>, <2 x double>) nounwind readnone
	define <2 x double> @test_x86_fmadd_baa_pd(<2 x double> %a, <2 x double> %b) #0 {			define <2 x double> @test_x86_fmadd_baa_pd(<2 x double> %a, <2 x double> %b) #0 {
	; CHECK-LABEL: test_x86_fmadd_baa_pd:			; CHECK-LABEL: test_x86_fmadd_baa_pd:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: vmovapd (%rcx), %xmm0			; CHECK-NEXT: vmovapd (%rcx), %xmm0
	; CHECK-NEXT: vfmadd132pd (%rdx), %xmm0, %xmm0			; CHECK-NEXT: vfmadd132pd (%rdx), %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <2 x double> @llvm.x86.fma.vfmadd.pd(<2 x double> %b, <2 x double> %a, <2 x double> %a) nounwind			%res = call <2 x double> @llvm.x86.fma.vfmadd.pd(<2 x double> %b, <2 x double> %a, <2 x double> %a) nounwind
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: vmovapd (%rdx), %ymm0			; CHECK-NEXT: vmovapd (%rdx), %ymm0
	; CHECK-NEXT: vfmadd213pd (%rcx), %ymm0, %ymm0			; CHECK-NEXT: vfmadd213pd (%rcx), %ymm0, %ymm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <4 x double> @llvm.x86.fma.vfmadd.pd.256(<4 x double> %b, <4 x double> %b, <4 x double> %a) nounwind			%res = call <4 x double> @llvm.x86.fma.vfmadd.pd.256(<4 x double> %b, <4 x double> %b, <4 x double> %a) nounwind
	ret <4 x double> %res			ret <4 x double> %res
	}			}


				declare <4 x float> @llvm.x86.fma.vfnmadd.ss(<4 x float>, <4 x float>, <4 x float>) nounwind readnone
				define <4 x float> @test_x86_fnmadd_baa_ss(<4 x float> %a, <4 x float> %b) #0 {
				; CHECK-LABEL: test_x86_fnmadd_baa_ss:
				; CHECK: # BB#0:
				; CHECK-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%rcx$, %xmm1}}
				; CHECK-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%rcx$, %xmm1}}
				; CHECK-NEXT: vfnmadd213ss %xmm1, %xmm1, %xmm0
				; CHECK-NEXT: retq
				%res = call <4 x float> @llvm.x86.fma.vfnmadd.ss(<4 x float> %b, <4 x float> %a, <4 x float> %a) nounwind
				ret <4 x float> %res
				}

				define <4 x float> @test_x86_fnmadd_aba_ss(<4 x float> %a, <4 x float> %b) #0 {
				; CHECK-LABEL: test_x86_fnmadd_aba_ss:
				; CHECK: # BB#0:
				; CHECK-NEXT: vmovaps (%rcx), %xmm0
				; CHECK-NEXT: vfnmadd132ss (%rdx), %xmm0, %xmm0
				; CHECK-NEXT: retq
				%res = call <4 x float> @llvm.x86.fma.vfnmadd.ss(<4 x float> %a, <4 x float> %b, <4 x float> %a) nounwind
				ret <4 x float> %res
				}

				define <4 x float> @test_x86_fnmadd_bba_ss(<4 x float> %a, <4 x float> %b) #0 {
				; CHECK-LABEL: test_x86_fnmadd_bba_ss:
				; CHECK: # BB#0:
				; CHECK-NEXT: vmovaps (%rdx), %xmm0
				; CHECK-NEXT: vfnmadd213ss (%rcx), %xmm0, %xmm0
				; CHECK-NEXT: retq
				%res = call <4 x float> @llvm.x86.fma.vfnmadd.ss(<4 x float> %b, <4 x float> %b, <4 x float> %a) nounwind
				ret <4 x float> %res
				}

	declare <4 x float> @llvm.x86.fma.vfnmadd.ps(<4 x float>, <4 x float>, <4 x float>) nounwind readnone			declare <4 x float> @llvm.x86.fma.vfnmadd.ps(<4 x float>, <4 x float>, <4 x float>) nounwind readnone
	define <4 x float> @test_x86_fnmadd_baa_ps(<4 x float> %a, <4 x float> %b) #0 {			define <4 x float> @test_x86_fnmadd_baa_ps(<4 x float> %a, <4 x float> %b) #0 {
	; CHECK-LABEL: test_x86_fnmadd_baa_ps:			; CHECK-LABEL: test_x86_fnmadd_baa_ps:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: vmovaps (%rcx), %xmm0			; CHECK-NEXT: vmovaps (%rcx), %xmm0
	; CHECK-NEXT: vfnmadd132ps (%rdx), %xmm0, %xmm0			; CHECK-NEXT: vfnmadd132ps (%rdx), %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: vmovaps (%rdx), %ymm0			; CHECK-NEXT: vmovaps (%rdx), %ymm0
	; CHECK-NEXT: vfnmadd213ps (%rcx), %ymm0, %ymm0			; CHECK-NEXT: vfnmadd213ps (%rcx), %ymm0, %ymm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <8 x float> @llvm.x86.fma.vfnmadd.ps.256(<8 x float> %b, <8 x float> %b, <8 x float> %a) nounwind			%res = call <8 x float> @llvm.x86.fma.vfnmadd.ps.256(<8 x float> %b, <8 x float> %b, <8 x float> %a) nounwind
	ret <8 x float> %res			ret <8 x float> %res
	}			}

				declare <2 x double> @llvm.x86.fma.vfnmadd.sd(<2 x double>, <2 x double>, <2 x double>) nounwind readnone
				define <2 x double> @test_x86_fnmadd_baa_sd(<2 x double> %a, <2 x double> %b) #0 {
				; CHECK-LABEL: test_x86_fnmadd_baa_sd:
				; CHECK: # BB#0:
				; CHECK-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%rcx$, %xmm1}}
				; CHECK-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%rcx$, %xmm1}}
				; CHECK-NEXT: vfnmadd213sd %xmm1, %xmm1, %xmm0
				; CHECK-NEXT: retq
				%res = call <2 x double> @llvm.x86.fma.vfnmadd.sd(<2 x double> %b, <2 x double> %a, <2 x double> %a) nounwind
				ret <2 x double> %res
				}

				define <2 x double> @test_x86_fnmadd_aba_sd(<2 x double> %a, <2 x double> %b) #0 {
				; CHECK-LABEL: test_x86_fnmadd_aba_sd:
				; CHECK: # BB#0:
				; CHECK-NEXT: vmovaps (%rcx), %xmm0
				; CHECK-NEXT: vfnmadd132sd (%rdx), %xmm0, %xmm0
				; CHECK-NEXT: retq
				%res = call <2 x double> @llvm.x86.fma.vfnmadd.sd(<2 x double> %a, <2 x double> %b, <2 x double> %a) nounwind
				ret <2 x double> %res
				}

				define <2 x double> @test_x86_fnmadd_bba_sd(<2 x double> %a, <2 x double> %b) #0 {
				; CHECK-LABEL: test_x86_fnmadd_bba_sd:
				; CHECK: # BB#0:
				; CHECK-NEXT: vmovaps (%rdx), %xmm0
				; CHECK-NEXT: vfnmadd213sd (%rcx), %xmm0, %xmm0
				; CHECK-NEXT: retq
				%res = call <2 x double> @llvm.x86.fma.vfnmadd.sd(<2 x double> %b, <2 x double> %b, <2 x double> %a) nounwind
				ret <2 x double> %res
				}

	declare <2 x double> @llvm.x86.fma.vfnmadd.pd(<2 x double>, <2 x double>, <2 x double>) nounwind readnone			declare <2 x double> @llvm.x86.fma.vfnmadd.pd(<2 x double>, <2 x double>, <2 x double>) nounwind readnone
	define <2 x double> @test_x86_fnmadd_baa_pd(<2 x double> %a, <2 x double> %b) #0 {			define <2 x double> @test_x86_fnmadd_baa_pd(<2 x double> %a, <2 x double> %b) #0 {
	; CHECK-LABEL: test_x86_fnmadd_baa_pd:			; CHECK-LABEL: test_x86_fnmadd_baa_pd:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: vmovapd (%rcx), %xmm0			; CHECK-NEXT: vmovapd (%rcx), %xmm0
	; CHECK-NEXT: vfnmadd132pd (%rdx), %xmm0, %xmm0			; CHECK-NEXT: vfnmadd132pd (%rdx), %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <2 x double> @llvm.x86.fma.vfnmadd.pd(<2 x double> %b, <2 x double> %a, <2 x double> %a) nounwind			%res = call <2 x double> @llvm.x86.fma.vfnmadd.pd(<2 x double> %b, <2 x double> %a, <2 x double> %a) nounwind
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: vmovapd (%rdx), %ymm0			; CHECK-NEXT: vmovapd (%rdx), %ymm0
	; CHECK-NEXT: vfnmadd213pd (%rcx), %ymm0, %ymm0			; CHECK-NEXT: vfnmadd213pd (%rcx), %ymm0, %ymm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <4 x double> @llvm.x86.fma.vfnmadd.pd.256(<4 x double> %b, <4 x double> %b, <4 x double> %a) nounwind			%res = call <4 x double> @llvm.x86.fma.vfnmadd.pd.256(<4 x double> %b, <4 x double> %b, <4 x double> %a) nounwind
	ret <4 x double> %res			ret <4 x double> %res
	}			}


				declare <4 x float> @llvm.x86.fma.vfmsub.ss(<4 x float>, <4 x float>, <4 x float>) nounwind readnone
				define <4 x float> @test_x86_fmsub_baa_ss(<4 x float> %a, <4 x float> %b) #0 {
				; CHECK-LABEL: test_x86_fmsub_baa_ss:
				; CHECK: # BB#0:
				; CHECK-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%rcx$, %xmm1}}
				; CHECK-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%rcx$, %xmm1}}
				; CHECK-NEXT: vfmsub213ss %xmm1, %xmm1, %xmm0
				; CHECK-NEXT: retq
				%res = call <4 x float> @llvm.x86.fma.vfmsub.ss(<4 x float> %b, <4 x float> %a, <4 x float> %a) nounwind
				ret <4 x float> %res
				}

				define <4 x float> @test_x86_fmsub_aba_ss(<4 x float> %a, <4 x float> %b) #0 {
				; CHECK-LABEL: test_x86_fmsub_aba_ss:
				; CHECK: # BB#0:
				; CHECK-NEXT: vmovaps (%rcx), %xmm0
				; CHECK-NEXT: vfmsub132ss (%rdx), %xmm0, %xmm0
				; CHECK-NEXT: retq
				%res = call <4 x float> @llvm.x86.fma.vfmsub.ss(<4 x float> %a, <4 x float> %b, <4 x float> %a) nounwind
				ret <4 x float> %res
				}

				define <4 x float> @test_x86_fmsub_bba_ss(<4 x float> %a, <4 x float> %b) #0 {
				; CHECK-LABEL: test_x86_fmsub_bba_ss:
				; CHECK: # BB#0:
				; CHECK-NEXT: vmovaps (%rdx), %xmm0
				; CHECK-NEXT: vfmsub213ss (%rcx), %xmm0, %xmm0
				; CHECK-NEXT: retq
				%res = call <4 x float> @llvm.x86.fma.vfmsub.ss(<4 x float> %b, <4 x float> %b, <4 x float> %a) nounwind
				ret <4 x float> %res
				}

	declare <4 x float> @llvm.x86.fma.vfmsub.ps(<4 x float>, <4 x float>, <4 x float>) nounwind readnone			declare <4 x float> @llvm.x86.fma.vfmsub.ps(<4 x float>, <4 x float>, <4 x float>) nounwind readnone
	define <4 x float> @test_x86_fmsub_baa_ps(<4 x float> %a, <4 x float> %b) #0 {			define <4 x float> @test_x86_fmsub_baa_ps(<4 x float> %a, <4 x float> %b) #0 {
	; CHECK-LABEL: test_x86_fmsub_baa_ps:			; CHECK-LABEL: test_x86_fmsub_baa_ps:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: vmovaps (%rcx), %xmm0			; CHECK-NEXT: vmovaps (%rcx), %xmm0
	; CHECK-NEXT: vfmsub132ps (%rdx), %xmm0, %xmm0			; CHECK-NEXT: vfmsub132ps (%rdx), %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <4 x float> @llvm.x86.fma.vfmsub.ps(<4 x float> %b, <4 x float> %a, <4 x float> %a) nounwind			%res = call <4 x float> @llvm.x86.fma.vfmsub.ps(<4 x float> %b, <4 x float> %a, <4 x float> %a) nounwind
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: vmovaps (%rdx), %ymm0			; CHECK-NEXT: vmovaps (%rdx), %ymm0
	; CHECK-NEXT: vfmsub213ps (%rcx), %ymm0, %ymm0			; CHECK-NEXT: vfmsub213ps (%rcx), %ymm0, %ymm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <8 x float> @llvm.x86.fma.vfmsub.ps.256(<8 x float> %b, <8 x float> %b, <8 x float> %a) nounwind			%res = call <8 x float> @llvm.x86.fma.vfmsub.ps.256(<8 x float> %b, <8 x float> %b, <8 x float> %a) nounwind
	ret <8 x float> %res			ret <8 x float> %res
	}			}

				declare <2 x double> @llvm.x86.fma.vfmsub.sd(<2 x double>, <2 x double>, <2 x double>) nounwind readnone
				define <2 x double> @test_x86_fmsub_baa_sd(<2 x double> %a, <2 x double> %b) #0 {
				; CHECK-LABEL: test_x86_fmsub_baa_sd:
				; CHECK: # BB#0:
				; CHECK-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%rcx$, %xmm1}}
				; CHECK-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%rcx$, %xmm1}}
				; CHECK-NEXT: vfmsub213sd %xmm1, %xmm1, %xmm0
				; CHECK-NEXT: retq
				%res = call <2 x double> @llvm.x86.fma.vfmsub.sd(<2 x double> %b, <2 x double> %a, <2 x double> %a) nounwind
				ret <2 x double> %res
				}

				define <2 x double> @test_x86_fmsub_aba_sd(<2 x double> %a, <2 x double> %b) #0 {
				; CHECK-LABEL: test_x86_fmsub_aba_sd:
				; CHECK: # BB#0:
				; CHECK-NEXT: vmovaps (%rcx), %xmm0
				; CHECK-NEXT: vfmsub132sd (%rdx), %xmm0, %xmm0
				; CHECK-NEXT: retq
				%res = call <2 x double> @llvm.x86.fma.vfmsub.sd(<2 x double> %a, <2 x double> %b, <2 x double> %a) nounwind
				ret <2 x double> %res
				}

				define <2 x double> @test_x86_fmsub_bba_sd(<2 x double> %a, <2 x double> %b) #0 {
				; CHECK-LABEL: test_x86_fmsub_bba_sd:
				; CHECK: # BB#0:
				; CHECK-NEXT: vmovaps (%rdx), %xmm0
				; CHECK-NEXT: vfmsub213sd (%rcx), %xmm0, %xmm0
				; CHECK-NEXT: retq
				%res = call <2 x double> @llvm.x86.fma.vfmsub.sd(<2 x double> %b, <2 x double> %b, <2 x double> %a) nounwind
				ret <2 x double> %res
				}

	declare <2 x double> @llvm.x86.fma.vfmsub.pd(<2 x double>, <2 x double>, <2 x double>) nounwind readnone			declare <2 x double> @llvm.x86.fma.vfmsub.pd(<2 x double>, <2 x double>, <2 x double>) nounwind readnone
	define <2 x double> @test_x86_fmsub_baa_pd(<2 x double> %a, <2 x double> %b) #0 {			define <2 x double> @test_x86_fmsub_baa_pd(<2 x double> %a, <2 x double> %b) #0 {
	; CHECK-LABEL: test_x86_fmsub_baa_pd:			; CHECK-LABEL: test_x86_fmsub_baa_pd:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: vmovapd (%rcx), %xmm0			; CHECK-NEXT: vmovapd (%rcx), %xmm0
	; CHECK-NEXT: vfmsub132pd (%rdx), %xmm0, %xmm0			; CHECK-NEXT: vfmsub132pd (%rdx), %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <2 x double> @llvm.x86.fma.vfmsub.pd(<2 x double> %b, <2 x double> %a, <2 x double> %a) nounwind			%res = call <2 x double> @llvm.x86.fma.vfmsub.pd(<2 x double> %b, <2 x double> %a, <2 x double> %a) nounwind
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: vmovapd (%rdx), %ymm0			; CHECK-NEXT: vmovapd (%rdx), %ymm0
	; CHECK-NEXT: vfmsub213pd (%rcx), %ymm0, %ymm0			; CHECK-NEXT: vfmsub213pd (%rcx), %ymm0, %ymm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <4 x double> @llvm.x86.fma.vfmsub.pd.256(<4 x double> %b, <4 x double> %b, <4 x double> %a) nounwind			%res = call <4 x double> @llvm.x86.fma.vfmsub.pd.256(<4 x double> %b, <4 x double> %b, <4 x double> %a) nounwind
	ret <4 x double> %res			ret <4 x double> %res
	}			}


				declare <4 x float> @llvm.x86.fma.vfnmsub.ss(<4 x float>, <4 x float>, <4 x float>) nounwind readnone
				define <4 x float> @test_x86_fnmsub_baa_ss(<4 x float> %a, <4 x float> %b) #0 {
				; CHECK-LABEL: test_x86_fnmsub_baa_ss:
				; CHECK: # BB#0:
				; CHECK-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%rcx$, %xmm1}}
				; CHECK-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%rcx$, %xmm1}}
				; CHECK-NEXT: vfnmsub213ss %xmm1, %xmm1, %xmm0
				; CHECK-NEXT: retq
				%res = call <4 x float> @llvm.x86.fma.vfnmsub.ss(<4 x float> %b, <4 x float> %a, <4 x float> %a) nounwind
				ret <4 x float> %res
				}

				define <4 x float> @test_x86_fnmsub_aba_ss(<4 x float> %a, <4 x float> %b) #0 {
				; CHECK-LABEL: test_x86_fnmsub_aba_ss:
				; CHECK: # BB#0:
				; CHECK-NEXT: vmovaps (%rcx), %xmm0
				; CHECK-NEXT: vfnmsub132ss (%rdx), %xmm0, %xmm0
				; CHECK-NEXT: retq
				%res = call <4 x float> @llvm.x86.fma.vfnmsub.ss(<4 x float> %a, <4 x float> %b, <4 x float> %a) nounwind
				ret <4 x float> %res
				}

				define <4 x float> @test_x86_fnmsub_bba_ss(<4 x float> %a, <4 x float> %b) #0 {
				; CHECK-LABEL: test_x86_fnmsub_bba_ss:
				; CHECK: # BB#0:
				; CHECK-NEXT: vmovaps (%rdx), %xmm0
				; CHECK-NEXT: vfnmsub213ss (%rcx), %xmm0, %xmm0
				; CHECK-NEXT: retq
				%res = call <4 x float> @llvm.x86.fma.vfnmsub.ss(<4 x float> %b, <4 x float> %b, <4 x float> %a) nounwind
				ret <4 x float> %res
				}

	declare <4 x float> @llvm.x86.fma.vfnmsub.ps(<4 x float>, <4 x float>, <4 x float>) nounwind readnone			declare <4 x float> @llvm.x86.fma.vfnmsub.ps(<4 x float>, <4 x float>, <4 x float>) nounwind readnone
	define <4 x float> @test_x86_fnmsub_baa_ps(<4 x float> %a, <4 x float> %b) #0 {			define <4 x float> @test_x86_fnmsub_baa_ps(<4 x float> %a, <4 x float> %b) #0 {
	; CHECK-LABEL: test_x86_fnmsub_baa_ps:			; CHECK-LABEL: test_x86_fnmsub_baa_ps:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: vmovaps (%rcx), %xmm0			; CHECK-NEXT: vmovaps (%rcx), %xmm0
	; CHECK-NEXT: vfnmsub132ps (%rdx), %xmm0, %xmm0			; CHECK-NEXT: vfnmsub132ps (%rdx), %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <4 x float> @llvm.x86.fma.vfnmsub.ps(<4 x float> %b, <4 x float> %a, <4 x float> %a) nounwind			%res = call <4 x float> @llvm.x86.fma.vfnmsub.ps(<4 x float> %b, <4 x float> %a, <4 x float> %a) nounwind
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: vmovaps (%rdx), %ymm0			; CHECK-NEXT: vmovaps (%rdx), %ymm0
	; CHECK-NEXT: vfnmsub213ps (%rcx), %ymm0, %ymm0			; CHECK-NEXT: vfnmsub213ps (%rcx), %ymm0, %ymm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <8 x float> @llvm.x86.fma.vfnmsub.ps.256(<8 x float> %b, <8 x float> %b, <8 x float> %a) nounwind			%res = call <8 x float> @llvm.x86.fma.vfnmsub.ps.256(<8 x float> %b, <8 x float> %b, <8 x float> %a) nounwind
	ret <8 x float> %res			ret <8 x float> %res
	}			}

				declare <2 x double> @llvm.x86.fma.vfnmsub.sd(<2 x double>, <2 x double>, <2 x double>) nounwind readnone
				define <2 x double> @test_x86_fnmsub_baa_sd(<2 x double> %a, <2 x double> %b) #0 {
				; CHECK-LABEL: test_x86_fnmsub_baa_sd:
				; CHECK: # BB#0:
				; CHECK-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%rcx$, %xmm1}}
				; CHECK-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%rcx$, %xmm1}}
				; CHECK-NEXT: vfnmsub213sd %xmm1, %xmm1, %xmm0
				; CHECK-NEXT: retq
				%res = call <2 x double> @llvm.x86.fma.vfnmsub.sd(<2 x double> %b, <2 x double> %a, <2 x double> %a) nounwind
				ret <2 x double> %res
				}

				define <2 x double> @test_x86_fnmsub_aba_sd(<2 x double> %a, <2 x double> %b) #0 {
				; CHECK-LABEL: test_x86_fnmsub_aba_sd:
				; CHECK: # BB#0:
				; CHECK-NEXT: vmovaps (%rcx), %xmm0
				; CHECK-NEXT: vfnmsub132sd (%rdx), %xmm0, %xmm0
				; CHECK-NEXT: retq
				%res = call <2 x double> @llvm.x86.fma.vfnmsub.sd(<2 x double> %a, <2 x double> %b, <2 x double> %a) nounwind
				ret <2 x double> %res
				}

				define <2 x double> @test_x86_fnmsub_bba_sd(<2 x double> %a, <2 x double> %b) #0 {
				; CHECK-LABEL: test_x86_fnmsub_bba_sd:
				; CHECK: # BB#0:
				; CHECK-NEXT: vmovaps (%rdx), %xmm0
				; CHECK-NEXT: vfnmsub213sd (%rcx), %xmm0, %xmm0
				; CHECK-NEXT: retq
				%res = call <2 x double> @llvm.x86.fma.vfnmsub.sd(<2 x double> %b, <2 x double> %b, <2 x double> %a) nounwind
				ret <2 x double> %res
				}

	declare <2 x double> @llvm.x86.fma.vfnmsub.pd(<2 x double>, <2 x double>, <2 x double>) nounwind readnone			declare <2 x double> @llvm.x86.fma.vfnmsub.pd(<2 x double>, <2 x double>, <2 x double>) nounwind readnone
	define <2 x double> @test_x86_fnmsub_baa_pd(<2 x double> %a, <2 x double> %b) #0 {			define <2 x double> @test_x86_fnmsub_baa_pd(<2 x double> %a, <2 x double> %b) #0 {
	; CHECK-LABEL: test_x86_fnmsub_baa_pd:			; CHECK-LABEL: test_x86_fnmsub_baa_pd:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: vmovapd (%rcx), %xmm0			; CHECK-NEXT: vmovapd (%rcx), %xmm0
	; CHECK-NEXT: vfnmsub132pd (%rdx), %xmm0, %xmm0			; CHECK-NEXT: vfnmsub132pd (%rdx), %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <2 x double> @llvm.x86.fma.vfnmsub.pd(<2 x double> %b, <2 x double> %a, <2 x double> %a) nounwind			%res = call <2 x double> @llvm.x86.fma.vfnmsub.pd(<2 x double> %b, <2 x double> %a, <2 x double> %a) nounwind
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/fma-intrinsics-x86.ll

	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -march=x86-64 -mcpu=corei7-avx -mattr=+fma \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-FMA			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -march=x86-64 -mcpu=corei7-avx -mattr=+fma \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-FMA
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -march=x86-64 -mcpu=core-avx2 -mattr=+fma,+avx2 \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-FMA			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -march=x86-64 -mcpu=core-avx2 -mattr=+fma,+avx2 \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-FMA
	; RUN: llc < %s -mtriple=x86_64-pc-windows -march=x86-64 -mcpu=core-avx2 -mattr=+fma,+avx2 \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-FMA-WIN			; RUN: llc < %s -mtriple=x86_64-pc-windows -march=x86-64 -mcpu=core-avx2 -mattr=+fma,+avx2 \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-FMA-WIN
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -march=x86-64 -mcpu=corei7-avx -mattr=+fma4 \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-FMA4			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -march=x86-64 -mcpu=corei7-avx -mattr=+fma4 \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-FMA4
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=bdver2 -mattr=+avx,-fma \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-FMA4			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=bdver2 -mattr=+avx,-fma \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-FMA4
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=bdver2 -mattr=-fma4 \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-FMA			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=bdver2 -mattr=-fma4 \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-FMA

	; VFMADD			; VFMADD
	define <4 x float> @test_x86_fma_vfmadd_ss(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) #0 {			define <4 x float> @test_x86_fma_vfmadd_ss(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) #0 {
	; CHECK-LABEL: test_x86_fma_vfmadd_ss:			; CHECK-LABEL: test_x86_fma_vfmadd_ss:
	; CHECK-NEXT: # BB#0:			; CHECK-NEXT: # BB#0:
	;			;
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdi)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rcx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rcx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vfmadd213ss (%r8), %xmm1, %xmm0			; CHECK-FMA-WIN-NEXT: vfmadd132ss (%rdx), %xmm1, %xmm0
	;			;
	; CHECK-FMA-NEXT: vfmadd213ss %xmm2, %xmm1, %xmm0			; CHECK-FMA-NEXT: vfmadd213ss %xmm2, %xmm1, %xmm0
	;			;
	; CHECK-FMA4-NEXT: vfmaddss %xmm2, %xmm1, %xmm0, %xmm0			; CHECK-FMA4-NEXT: vfmaddss %xmm2, %xmm1, %xmm0, %xmm0
	;			;
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <4 x float> @llvm.x86.fma.vfmadd.ss(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2)			%res = call <4 x float> @llvm.x86.fma.vfmadd.ss(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2)
	ret <4 x float> %res			ret <4 x float> %res
	}			}

	define <4 x float> @test_x86_fma_vfmadd_bac_ss(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) #0 {			define <4 x float> @test_x86_fma_vfmadd_bac_ss(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) #0 {
	; CHECK-LABEL: test_x86_fma_vfmadd_bac_ss:			; CHECK-LABEL: test_x86_fma_vfmadd_bac_ss:
	; CHECK-NEXT: # BB#0:			; CHECK-NEXT: # BB#0:
	;			;
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vfmadd213ss (%r8), %xmm1, %xmm0			; CHECK-FMA-WIN-NEXT: vfmadd132ss (%rcx), %xmm1, %xmm0
	;			;
	; CHECK-FMA-NEXT: vfmadd213ss %xmm2, %xmm0, %xmm1			; CHECK-FMA-NEXT: vfmadd213ss %xmm2, %xmm0, %xmm1
	; CHECK-FMA-NEXT: vmovaps %xmm1, %xmm0			; CHECK-FMA-NEXT: vmovaps %xmm1, %xmm0
	;			;
	; CHECK-FMA4-NEXT: vfmaddss %xmm2, %xmm0, %xmm1, %xmm0			; CHECK-FMA4-NEXT: vfmaddss %xmm2, %xmm0, %xmm1, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <4 x float> @llvm.x86.fma.vfmadd.ss(<4 x float> %a1, <4 x float> %a0, <4 x float> %a2)			%res = call <4 x float> @llvm.x86.fma.vfmadd.ss(<4 x float> %a1, <4 x float> %a0, <4 x float> %a2)
	ret <4 x float> %res			ret <4 x float> %res
	}			}
	declare <4 x float> @llvm.x86.fma.vfmadd.ss(<4 x float>, <4 x float>, <4 x float>)			declare <4 x float> @llvm.x86.fma.vfmadd.ss(<4 x float>, <4 x float>, <4 x float>)

	define <2 x double> @test_x86_fma_vfmadd_sd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2) #0 {			define <2 x double> @test_x86_fma_vfmadd_sd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2) #0 {
	; CHECK-LABEL: test_x86_fma_vfmadd_sd:			; CHECK-LABEL: test_x86_fma_vfmadd_sd:
	; CHECK-NEXT: # BB#0:			; CHECK-NEXT: # BB#0:
	;			;
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovaps {{$%rcx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovaps {{$%rcx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vfmadd213sd (%r8), %xmm1, %xmm0			; CHECK-FMA-WIN-NEXT: vfmadd132sd (%rdx), %xmm1, %xmm0
	;			;
	; CHECK-FMA-NEXT: vfmadd213sd %xmm2, %xmm1, %xmm0			; CHECK-FMA-NEXT: vfmadd213sd %xmm2, %xmm1, %xmm0
	;			;
	; CHECK-FMA4-NEXT: vfmaddsd %xmm2, %xmm1, %xmm0, %xmm0			; CHECK-FMA4-NEXT: vfmaddsd %xmm2, %xmm1, %xmm0, %xmm0
	;			;
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <2 x double> @llvm.x86.fma.vfmadd.sd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2)			%res = call <2 x double> @llvm.x86.fma.vfmadd.sd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2)
	ret <2 x double> %res			ret <2 x double> %res
	}			}

	define <2 x double> @test_x86_fma_vfmadd_bac_sd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2) #0 {			define <2 x double> @test_x86_fma_vfmadd_bac_sd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2) #0 {
	; CHECK-LABEL: test_x86_fma_vfmadd_bac_sd:			; CHECK-LABEL: test_x86_fma_vfmadd_bac_sd:
	; CHECK-NEXT: # BB#0:			; CHECK-NEXT: # BB#0:
	;			;
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vfmadd213sd (%r8), %xmm1, %xmm0			; CHECK-FMA-WIN-NEXT: vfmadd132sd (%rcx), %xmm1, %xmm0
	;			;
	; CHECK-FMA-NEXT: vfmadd213sd %xmm2, %xmm0, %xmm1			; CHECK-FMA-NEXT: vfmadd213sd %xmm2, %xmm0, %xmm1
	; CHECK-FMA-NEXT: vmovaps %xmm1, %xmm0			; CHECK-FMA-NEXT: vmovaps %xmm1, %xmm0
	;			;
	; CHECK-FMA4-NEXT: vfmaddsd %xmm2, %xmm0, %xmm1, %xmm0			; CHECK-FMA4-NEXT: vfmaddsd %xmm2, %xmm0, %xmm1, %xmm0
	;			;
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <2 x double> @llvm.x86.fma.vfmadd.sd(<2 x double> %a1, <2 x double> %a0, <2 x double> %a2)			%res = call <2 x double> @llvm.x86.fma.vfmadd.sd(<2 x double> %a1, <2 x double> %a0, <2 x double> %a2)
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	}			}
	declare <4 x double> @llvm.x86.fma.vfmadd.pd.256(<4 x double>, <4 x double>, <4 x double>)			declare <4 x double> @llvm.x86.fma.vfmadd.pd.256(<4 x double>, <4 x double>, <4 x double>)

	; VFMSUB			; VFMSUB
	define <4 x float> @test_x86_fma_vfmsub_ss(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) #0 {			define <4 x float> @test_x86_fma_vfmsub_ss(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) #0 {
	; CHECK-LABEL: test_x86_fma_vfmsub_ss:			; CHECK-LABEL: test_x86_fma_vfmsub_ss:
	; CHECK-NEXT: # BB#0:			; CHECK-NEXT: # BB#0:
	;			;
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovaps {{$%rcx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovaps {{$%rcx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vfmsub213ss (%r8), %xmm1, %xmm0			; CHECK-FMA-WIN-NEXT: vfmsub132ss (%rdx), %xmm1, %xmm0
	;			;
	; CHECK-FMA-NEXT: vfmsub213ss %xmm2, %xmm1, %xmm0			; CHECK-FMA-NEXT: vfmsub213ss %xmm2, %xmm1, %xmm0
	;			;
	; CHECK-FMA4-NEXT: vfmsubss %xmm2, %xmm1, %xmm0, %xmm0			; CHECK-FMA4-NEXT: vfmsubss %xmm2, %xmm1, %xmm0, %xmm0
	;			;
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <4 x float> @llvm.x86.fma.vfmsub.ss(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2)			%res = call <4 x float> @llvm.x86.fma.vfmsub.ss(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2)
	ret <4 x float> %res			ret <4 x float> %res
	}			}

	define <4 x float> @test_x86_fma_vfmsub_bac_ss(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) #0 {			define <4 x float> @test_x86_fma_vfmsub_bac_ss(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) #0 {
	; CHECK-LABEL: test_x86_fma_vfmsub_bac_ss:			; CHECK-LABEL: test_x86_fma_vfmsub_bac_ss:
	; CHECK-NEXT: # BB#0:			; CHECK-NEXT: # BB#0:
	;			;
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vfmsub213ss (%r8), %xmm1, %xmm0			; CHECK-FMA-WIN-NEXT: vfmsub132ss (%rcx), %xmm1, %xmm0
	;			;
	; CHECK-FMA-NEXT: vfmsub213ss %xmm2, %xmm0, %xmm1			; CHECK-FMA-NEXT: vfmsub213ss %xmm2, %xmm0, %xmm1
	; CHECK-FMA-NEXT: vmovaps %xmm1, %xmm0			; CHECK-FMA-NEXT: vmovaps %xmm1, %xmm0
	;			;
	; CHECK-FMA4-NEXT: vfmsubss %xmm2, %xmm0, %xmm1, %xmm0			; CHECK-FMA4-NEXT: vfmsubss %xmm2, %xmm0, %xmm1, %xmm0
	;			;
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <4 x float> @llvm.x86.fma.vfmsub.ss(<4 x float> %a1, <4 x float> %a0, <4 x float> %a2)			%res = call <4 x float> @llvm.x86.fma.vfmsub.ss(<4 x float> %a1, <4 x float> %a0, <4 x float> %a2)
	ret <4 x float> %res			ret <4 x float> %res
	}			}
	declare <4 x float> @llvm.x86.fma.vfmsub.ss(<4 x float>, <4 x float>, <4 x float>)			declare <4 x float> @llvm.x86.fma.vfmsub.ss(<4 x float>, <4 x float>, <4 x float>)

	define <2 x double> @test_x86_fma_vfmsub_sd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2) #0 {			define <2 x double> @test_x86_fma_vfmsub_sd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2) #0 {
	; CHECK-LABEL: test_x86_fma_vfmsub_sd:			; CHECK-LABEL: test_x86_fma_vfmsub_sd:
	; CHECK-NEXT: # BB#0:			; CHECK-NEXT: # BB#0:
	;			;
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rcx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rcx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vfmsub213sd (%r8), %xmm1, %xmm0			; CHECK-FMA-WIN-NEXT: vfmsub132sd (%rdx), %xmm1, %xmm0
	;			;
	; CHECK-FMA-NEXT: vfmsub213sd %xmm2, %xmm1, %xmm0			; CHECK-FMA-NEXT: vfmsub213sd %xmm2, %xmm1, %xmm0
	;			;
	; CHECK-FMA4-NEXT: vfmsubsd %xmm2, %xmm1, %xmm0, %xmm0			; CHECK-FMA4-NEXT: vfmsubsd %xmm2, %xmm1, %xmm0, %xmm0
	;			;
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <2 x double> @llvm.x86.fma.vfmsub.sd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2)			%res = call <2 x double> @llvm.x86.fma.vfmsub.sd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2)
	ret <2 x double> %res			ret <2 x double> %res
	}			}

	define <2 x double> @test_x86_fma_vfmsub_bac_sd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2) #0 {			define <2 x double> @test_x86_fma_vfmsub_bac_sd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2) #0 {
	; CHECK-LABEL: test_x86_fma_vfmsub_bac_sd:			; CHECK-LABEL: test_x86_fma_vfmsub_bac_sd:
	; CHECK-NEXT: # BB#0:			; CHECK-NEXT: # BB#0:
	;			;
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vfmsub213sd (%r8), %xmm1, %xmm0			; CHECK-FMA-WIN-NEXT: vfmsub132sd (%rcx), %xmm1, %xmm0
	;			;
	; CHECK-FMA-NEXT: vfmsub213sd %xmm2, %xmm0, %xmm1			; CHECK-FMA-NEXT: vfmsub213sd %xmm2, %xmm0, %xmm1
	; CHECK-FMA-NEXT: vmovaps %xmm1, %xmm0			; CHECK-FMA-NEXT: vmovaps %xmm1, %xmm0
	;			;
	; CHECK-FMA4-NEXT: vfmsubsd %xmm2, %xmm0, %xmm1, %xmm0			; CHECK-FMA4-NEXT: vfmsubsd %xmm2, %xmm0, %xmm1, %xmm0
	;			;
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <2 x double> @llvm.x86.fma.vfmsub.sd(<2 x double> %a1, <2 x double> %a0, <2 x double> %a2)			%res = call <2 x double> @llvm.x86.fma.vfmsub.sd(<2 x double> %a1, <2 x double> %a0, <2 x double> %a2)
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	}			}
	declare <4 x double> @llvm.x86.fma.vfmsub.pd.256(<4 x double>, <4 x double>, <4 x double>)			declare <4 x double> @llvm.x86.fma.vfmsub.pd.256(<4 x double>, <4 x double>, <4 x double>)

	; VFNMADD			; VFNMADD
	define <4 x float> @test_x86_fma_vfnmadd_ss(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) #0 {			define <4 x float> @test_x86_fma_vfnmadd_ss(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) #0 {
	; CHECK-LABEL: test_x86_fma_vfnmadd_ss:			; CHECK-LABEL: test_x86_fma_vfnmadd_ss:
	; CHECK-NEXT: # BB#0:			; CHECK-NEXT: # BB#0:
	;			;
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rcx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rcx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vfnmadd213ss (%r8), %xmm1, %xmm0			; CHECK-FMA-WIN-NEXT: vfnmadd132ss (%rdx), %xmm1, %xmm0
	;			;
	; CHECK-FMA-NEXT: vfnmadd213ss %xmm2, %xmm1, %xmm0			; CHECK-FMA-NEXT: vfnmadd213ss %xmm2, %xmm1, %xmm0
	;			;
	; CHECK-FMA4-NEXT: vfnmaddss %xmm2, %xmm1, %xmm0, %xmm0			; CHECK-FMA4-NEXT: vfnmaddss %xmm2, %xmm1, %xmm0, %xmm0
	;			;
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <4 x float> @llvm.x86.fma.vfnmadd.ss(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2)			%res = call <4 x float> @llvm.x86.fma.vfnmadd.ss(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2)
	ret <4 x float> %res			ret <4 x float> %res
	}			}

	define <4 x float> @test_x86_fma_vfnmadd_bac_ss(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) #0 {			define <4 x float> @test_x86_fma_vfnmadd_bac_ss(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) #0 {
	; CHECK-LABEL: test_x86_fma_vfnmadd_bac_ss:			; CHECK-LABEL: test_x86_fma_vfnmadd_bac_ss:
	; CHECK-NEXT: # BB#0:			; CHECK-NEXT: # BB#0:
	;			;
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vfnmadd213ss (%r8), %xmm1, %xmm0			; CHECK-FMA-WIN-NEXT: vfnmadd132ss (%rcx), %xmm1, %xmm0
	;			;
	; CHECK-FMA-NEXT: vfnmadd213ss %xmm2, %xmm0, %xmm1			; CHECK-FMA-NEXT: vfnmadd213ss %xmm2, %xmm0, %xmm1
	; CHECK-FMA-NEXT: vmovaps %xmm1, %xmm0			; CHECK-FMA-NEXT: vmovaps %xmm1, %xmm0
	;			;
	; CHECK-FMA4-NEXT: vfnmaddss %xmm2, %xmm0, %xmm1, %xmm0			; CHECK-FMA4-NEXT: vfnmaddss %xmm2, %xmm0, %xmm1, %xmm0
	;			;
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <4 x float> @llvm.x86.fma.vfnmadd.ss(<4 x float> %a1, <4 x float> %a0, <4 x float> %a2)			%res = call <4 x float> @llvm.x86.fma.vfnmadd.ss(<4 x float> %a1, <4 x float> %a0, <4 x float> %a2)
	ret <4 x float> %res			ret <4 x float> %res
	}			}
	declare <4 x float> @llvm.x86.fma.vfnmadd.ss(<4 x float>, <4 x float>, <4 x float>)			declare <4 x float> @llvm.x86.fma.vfnmadd.ss(<4 x float>, <4 x float>, <4 x float>)

	define <2 x double> @test_x86_fma_vfnmadd_sd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2) #0 {			define <2 x double> @test_x86_fma_vfnmadd_sd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2) #0 {
	; CHECK-LABEL: test_x86_fma_vfnmadd_sd:			; CHECK-LABEL: test_x86_fma_vfnmadd_sd:
	; CHECK-NEXT: # BB#0:			; CHECK-NEXT: # BB#0:
	;			;
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rcx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rcx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vfnmadd213sd (%r8), %xmm1, %xmm0			; CHECK-FMA-WIN-NEXT: vfnmadd132sd (%rdx), %xmm1, %xmm0
	;			;
	; CHECK-FMA-NEXT: vfnmadd213sd %xmm2, %xmm1, %xmm0			; CHECK-FMA-NEXT: vfnmadd213sd %xmm2, %xmm1, %xmm0
	;			;
	; CHECK-FMA4-NEXT: vfnmaddsd %xmm2, %xmm1, %xmm0, %xmm0			; CHECK-FMA4-NEXT: vfnmaddsd %xmm2, %xmm1, %xmm0, %xmm0
	;			;
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <2 x double> @llvm.x86.fma.vfnmadd.sd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2)			%res = call <2 x double> @llvm.x86.fma.vfnmadd.sd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2)
	ret <2 x double> %res			ret <2 x double> %res
	}			}

	define <2 x double> @test_x86_fma_vfnmadd_bac_sd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2) #0 {			define <2 x double> @test_x86_fma_vfnmadd_bac_sd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2) #0 {
	; CHECK-LABEL: test_x86_fma_vfnmadd_bac_sd:			; CHECK-LABEL: test_x86_fma_vfnmadd_bac_sd:
	; CHECK-NEXT: # BB#0:			; CHECK-NEXT: # BB#0:
	;			;
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vfnmadd213sd (%r8), %xmm1, %xmm0			; CHECK-FMA-WIN-NEXT: vfnmadd132sd (%rcx), %xmm1, %xmm0
	;			;
	; CHECK-FMA-NEXT: vfnmadd213sd %xmm2, %xmm0, %xmm1			; CHECK-FMA-NEXT: vfnmadd213sd %xmm2, %xmm0, %xmm1
	; CHECK-FMA-NEXT: vmovaps %xmm1, %xmm0			; CHECK-FMA-NEXT: vmovaps %xmm1, %xmm0
	;			;
	; CHECK-FMA4-NEXT: vfnmaddsd %xmm2, %xmm0, %xmm1, %xmm0			; CHECK-FMA4-NEXT: vfnmaddsd %xmm2, %xmm0, %xmm1, %xmm0
	;			;
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <2 x double> @llvm.x86.fma.vfnmadd.sd(<2 x double> %a1, <2 x double> %a0, <2 x double> %a2)			%res = call <2 x double> @llvm.x86.fma.vfnmadd.sd(<2 x double> %a1, <2 x double> %a0, <2 x double> %a2)
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	}			}
	declare <4 x double> @llvm.x86.fma.vfnmadd.pd.256(<4 x double>, <4 x double>, <4 x double>)			declare <4 x double> @llvm.x86.fma.vfnmadd.pd.256(<4 x double>, <4 x double>, <4 x double>)

	; VFNMSUB			; VFNMSUB
	define <4 x float> @test_x86_fma_vfnmsub_ss(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) #0 {			define <4 x float> @test_x86_fma_vfnmsub_ss(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) #0 {
	; CHECK-LABEL: test_x86_fma_vfnmsub_ss:			; CHECK-LABEL: test_x86_fma_vfnmsub_ss:
	; CHECK-NEXT: # BB#0:			; CHECK-NEXT: # BB#0:
	;			;
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rcx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rcx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vfnmsub213ss (%r8), %xmm1, %xmm0			; CHECK-FMA-WIN-NEXT: vfnmsub132ss (%rdx), %xmm1, %xmm0
	;			;
	; CHECK-FMA-NEXT: vfnmsub213ss %xmm2, %xmm1, %xmm0			; CHECK-FMA-NEXT: vfnmsub213ss %xmm2, %xmm1, %xmm0
	;			;
	; CHECK-FMA4-NEXT: vfnmsubss %xmm2, %xmm1, %xmm0, %xmm0			; CHECK-FMA4-NEXT: vfnmsubss %xmm2, %xmm1, %xmm0, %xmm0
	;			;
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <4 x float> @llvm.x86.fma.vfnmsub.ss(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2)			%res = call <4 x float> @llvm.x86.fma.vfnmsub.ss(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2)
	ret <4 x float> %res			ret <4 x float> %res
	}			}

	define <4 x float> @test_x86_fma_vfnmsub_bac_ss(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) #0 {			define <4 x float> @test_x86_fma_vfnmsub_bac_ss(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) #0 {
	; CHECK-LABEL: test_x86_fma_vfnmsub_bac_ss:			; CHECK-LABEL: test_x86_fma_vfnmsub_bac_ss:
	; CHECK-NEXT: # BB#0:			; CHECK-NEXT: # BB#0:
	;			;
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vfnmsub213ss (%r8), %xmm1, %xmm0			; CHECK-FMA-WIN-NEXT: vfnmsub132ss (%rcx), %xmm1, %xmm0
	;			;
	; CHECK-FMA-NEXT: vfnmsub213ss %xmm2, %xmm0, %xmm1			; CHECK-FMA-NEXT: vfnmsub213ss %xmm2, %xmm0, %xmm1
	; CHECK-FMA-NEXT: vmovaps %xmm1, %xmm0			; CHECK-FMA-NEXT: vmovaps %xmm1, %xmm0
	;			;
	; CHECK-FMA4-NEXT: vfnmsubss %xmm2, %xmm0, %xmm1, %xmm0			; CHECK-FMA4-NEXT: vfnmsubss %xmm2, %xmm0, %xmm1, %xmm0
	;			;
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <4 x float> @llvm.x86.fma.vfnmsub.ss(<4 x float> %a1, <4 x float> %a0, <4 x float> %a2)			%res = call <4 x float> @llvm.x86.fma.vfnmsub.ss(<4 x float> %a1, <4 x float> %a0, <4 x float> %a2)
	ret <4 x float> %res			ret <4 x float> %res
	}			}
	declare <4 x float> @llvm.x86.fma.vfnmsub.ss(<4 x float>, <4 x float>, <4 x float>)			declare <4 x float> @llvm.x86.fma.vfnmsub.ss(<4 x float>, <4 x float>, <4 x float>)

	define <2 x double> @test_x86_fma_vfnmsub_sd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2) #0 {			define <2 x double> @test_x86_fma_vfnmsub_sd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2) #0 {
	; CHECK-LABEL: test_x86_fma_vfnmsub_sd:			; CHECK-LABEL: test_x86_fma_vfnmsub_sd:
	; CHECK-NEXT: # BB#0:			; CHECK-NEXT: # BB#0:
	;			;
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rcx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rcx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vfnmsub213sd (%r8), %xmm1, %xmm0			; CHECK-FMA-WIN-NEXT: vfnmsub132sd (%rdx), %xmm1, %xmm0
	;			;
	; CHECK-FMA-NEXT: vfnmsub213sd %xmm2, %xmm1, %xmm0			; CHECK-FMA-NEXT: vfnmsub213sd %xmm2, %xmm1, %xmm0
	;			;
	; CHECK-FMA4-NEXT: vfnmsubsd %xmm2, %xmm1, %xmm0, %xmm0			; CHECK-FMA4-NEXT: vfnmsubsd %xmm2, %xmm1, %xmm0, %xmm0
	;			;
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <2 x double> @llvm.x86.fma.vfnmsub.sd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2)			%res = call <2 x double> @llvm.x86.fma.vfnmsub.sd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2)
	ret <2 x double> %res			ret <2 x double> %res
	}			}

	define <2 x double> @test_x86_fma_vfnmsub_bac_sd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2) #0 {			define <2 x double> @test_x86_fma_vfnmsub_bac_sd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2) #0 {
	; CHECK-LABEL: test_x86_fma_vfnmsub_bac_sd:			; CHECK-LABEL: test_x86_fma_vfnmsub_bac_sd:
	; CHECK-NEXT: # BB#0:			; CHECK-NEXT: # BB#0:
	;			;
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vmovaps (%{{(rcx\|rdx)}}), %xmm{{0\|1}}			; CHECK-FMA-WIN-NEXT: vmovap{{s\|d}} {{$%rdx$, %xmm0\|$%r8$, %xmm1}}
	; CHECK-FMA-WIN-NEXT: vfnmsub213sd (%r8), %xmm1, %xmm0			; CHECK-FMA-WIN-NEXT: vfnmsub132sd (%rcx), %xmm1, %xmm0
	;			;
	; CHECK-FMA-NEXT: vfnmsub213sd %xmm2, %xmm0, %xmm1			; CHECK-FMA-NEXT: vfnmsub213sd %xmm2, %xmm0, %xmm1
	; CHECK-FMA-NEXT: vmovaps %xmm1, %xmm0			; CHECK-FMA-NEXT: vmovaps %xmm1, %xmm0
	;			;
	; CHECK-FMA4-NEXT: vfnmsubsd %xmm2, %xmm0, %xmm1, %xmm0			; CHECK-FMA4-NEXT: vfnmsubsd %xmm2, %xmm0, %xmm1, %xmm0
	;			;
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%res = call <2 x double> @llvm.x86.fma.vfnmsub.sd(<2 x double> %a1, <2 x double> %a0, <2 x double> %a2)			%res = call <2 x double> @llvm.x86.fma.vfnmsub.sd(<2 x double> %a1, <2 x double> %a0, <2 x double> %a2)
	▲ Show 20 Lines • Show All 223 Lines • Show Last 20 Lines