This is an archive of the discontinued LLVM Phabricator instance.

[X86] Remove DenseMap for storing FMA3 grouping information
AbandonedPublic

Authored by craig.topper on Aug 25 2016, 11:41 PM.

Download Raw Diff

Details

Reviewers

spatel
v_klochkov
RKSimon
delena
mkuper

Summary

This patch removes the DenseMap for keeping track of FMA3 grouping information avoiding the startup cost of populating the map and the associated memory usage.

Three new bits are added to the instruction TSFlags for keeping track of the form(132, 213, 231) and whether it is a scalar instrinsic instruction.

The 3 different forms of each instruction are combined into groups similar to the current code. But each of the groups are stored into static tables. Each table is sorted by the opcode of each form. Since opcodes encodings are assigned alphabetically and each form is named the same except for the 132, 213, or 231, when one form is sorted the other two forms are sorted. With the tables sorted, we can find the group for a given opcode by getting the form from the TSFlags and doing a binary search through the appropriate column of the table.

There are 6 tables, split by the evex.b bit, memory/register, and masked/unmasked. The two evex.b tables contain masked and unmasked together. The masked/unmasked split for non evex.b makes it easy to populate the load folding tables. The instructions that use evex.b cannot be folded. For the tables without evex.b the register tables are the same size as their equivalent memory table and the opcodes are in the same order. Converting from register form to memory form is as simple as finding the row in one table and looking up the same row in the opposite table. Determining which table an opcode is in can be determined from other TSFlags bits.

Currently the getFMA3Group function is a private function in X86InstrInfo.cpp, but could be made a static function in the X86InstrInfo class if it becomes needed outside this file.

The load folding table creation as well as the commuting code has been updated to use the new interface. The commuting code makes use of new and existing TSFlags to determine additional information about the opcodes beyond which group they are in.

Diff Detail

Event Timeline

craig.topper updated this revision to Diff 69316.Aug 25 2016, 11:41 PM

craig.topper retitled this revision from to [X86] Remove DenseMap for storing FMA3 grouping information.

craig.topper updated this object.

craig.topper added reviewers: v_klochkov, RKSimon, mkuper, delena, spatel.

craig.topper added a subscriber: llvm-commits.

`Hi Craig,

Unfortunately, you were not aware of my bigger plans for the classes X86InstrFMA3Info/X86InstrFMA3Group (you planned to remove them in your change-set).

I like some parts of your patch,
for example, the usage of TSFlags instead of KMergeMasked, KZeroMasked bits in X86InstrFMA3Group::Attrubutes,

There are though some serious concerns from my side as your patch would ruin my plans to re-use the classes X86InstrFMA3Info/X86InstrFMA3Group (added previously in https://reviews.llvm.org/rL278431)
in some new very advanced FMA optimization I am developing right now.
Please, see my concerns in (2) and especially (3) below.

Also, adding 3 bits to TSFlags specifically for FMA3 opcodes probably would require X86 component owner's attention/review.

At the end of this message I added some ideas how to make the current classes X86InstrFMA3{Info/Group} a bit more efficient and better usable in future.

Minor. It is good that the DenseMap is replaced with static arrays and there is no initialization overhead now.

The opcodes in the arrays are sorted right after initialization, and you rely on the order in which the opcodes are created for *.td files. That seems a bit hacky/risky to me, but is Ok as you added is_sorted() checks.

Having explicit call to std::sort()-like functions to eliminate such assumptions would add the overhead and would kill the main point in having these changes.

Major. FMA3 is just a small sub-set of X86 instructions. I think FMA3 instructions do not deserve 3 bits in TSFlags. You used bits #55,56,57. Only 6 bits left for some probably more important features, after that the bit field should be extended and that would affect all opcodes.

a) 2 bits: {NotFMA3, FMA3_132, FMA3_213, FMA3_231}, I don't have very strong opinion about these 2 bits for FMA3 but think this is too many bits for just FMA3 opcodes.

b) The 1 bit 'FMA3Intrinsic' is definitely too big gift for FMA3 opcodes. I would rather vote for 'IsIntrinsic' bit, which would be set for all *_Int opcodes such as 'ADDSDrr_Int', etc.

MAJOR. I am implementing a new advanced FMA optimization and planned to add more bits to X86InstrFMA3Group::Attributes (being removed in change-set).

a) I planned to add information about signs (i.e. FMA, FMS, FNMA, FNMS, FMADDSUB, FMSUBADD) Adding new fields to X86InstrFMA3Group::Attributes is quite natural and Ok, but adding 3 more bits to TSFlags is not an option. b) The optimization also needs to operate with MVT and I planned to add an MVT field to X86InstrFMA3Group::Attributes as well. Currently, I don't see how to extract information from Opcode about the number and size of processed elements. (i.e. f32, f64, v4f32, v2f64, ...).

So, with (a) and (b) that optimization would do requests to X86InstrFMA3Info to get FMA opcodes: unsigned getFMA213Opcode(bool IsEVEX, bool MulSign, bool AddSign, MVT VT, bool KMasked=false, bool KZMasked=false); or alternatively: unsigned getFMA213Opcode(bool IsEVEX, MVT VT, bool KMasked=false, bool KZMasked=false); unsigned getFMS213Opcode(bool IsEVEX, MVT VT, bool KMasked=false, bool KZMasked=false); unsigned getFNMA213Opcode(bool IsEVEX, MVT VT, bool KMasked=false, bool KZMasked=false); unsigned getFNMS213Opcode(bool IsEVEX, MVT VT, bool KMasked=false, bool KZMasked=false);

for example, bool UseEVEX = HasEVEX && HasVL; unsigned Opc213 = getFMA213Opcode(UseEVEX, true, false, MVT::v4f64); // Opc213 is initialized with (UseEVEX ? VFNMADD213PDZ256r : VFNMADD213PDYr).

Some ideas:
I was thinking about using a different approach to reduce the number
of small static arrays/FMA3Groups (it was one of your major concerns in https://reviews.llvm.org/rL278431)
FMA3Groups could be consisting of the bigger number of fields, i.e.:

// VEX (AVX2) opcodes.
uint16_t VEX_Reg132, VEX_Reg213, VEX_Reg231;
uint16_t VEX_Mem132, VEX_Mem213, VEX_Mem231;

// EVEX opcodes.
uint16_t EVEX_Reg132, EVEX_Reg213, EVEX_Reg231;
uint16_t EVEX_Mem132, EVEX_Mem213, EVEX_Mem231;

// k-masked opcodes.
uint16_t EVEX_KReg132, EVEX_KReg213, EVEX_KReg231;
uint16_t EVEX_KMem132, EVEX_KMem213, EVEX_KMem231;

// k-zero-masked opcodes.
uint16_t EVEX_KZReg132, EVEX_KZReg213, EVEX_KZReg231;
uint16_t EVEX_KZMem132, EVEX_KZMem213, EVEX_KZMem231;

// EVEX.B: Opcodes with explicit round control and with broadcast.
uint16_t EVEX_ERound132, EVEX_ERound213, EVEX_ERound231;
uint16_t EVEX_Broadcast132, EVEX_Broadcast213, EVEX_Broadcast231;

unsigned Attributes; // MVT + {FMA,FMS,FNMA,FNMS,FMADDSUB,FMSUBADD} + IsIntrisic + ...

Also, FMA3Groups could even include *_Int opcodes as well.
(i.e. uint16_t VEX_Reg132, VEX_Reg213,..., VEX_Reg132_Int, VEX_Reg213_Int,...;)

Making groups bigger would improve the search of the FMA opcodes mentioned in (3) above
(i.e. getFMA213Opcode() method), because there would be quite small number of FMA groups then, and even linear search would probably be Ok.

Thank you,
Vyacheslav Klochkov
`

Hi Vyacheslav,

I suspected this would interfere with your plans. We had spoke a little about your optimization plans, but I wasn't sure if your optimization would be before or after isel.

I agree the FMA3 intrinsic bit was bad. I did remove 4 TSFlags bit before I made this patch so I did pay for myself, but you're right.

I believe you can get single precision or double precision from the VEX_W bit in TSFlags. But I don't think you can get packed vs scalar. Or add vs sub vs addsub vs subadd, etc without just hardcoding all the opcodes. There does appear to be at least some pattern to it. For instance all 132 opcodes are 0x96-0x9f, and all 213 opcodes are 0xa6-0xaf, and all 231 opcodes are 0xb6-0xbf. So you can infer the form from the first nibble. Not sure if we should rely on that without someway to check it.

I think I like some of your suggestion. I'll go see what I can come up with based on that and with your other requirements in mind.

`Craig,

Your comment regarding using the base opcode byte was a good surprise for me.
I did not realize that it is possible to use it and that it is always available in TSFlags.

I put FMA3 opcodes into a table.
The 9,a,b - columns are the senior 4 bits of the opcode byte.
The 6,7,8,9,a,b,c,d,e,f - rows are the lower 4 bits of the opcode byte.
For example, fmadd132ps opcode has the base opcode byte = 0x98.

			9					a					b
		6	fmaddsub132ps/pd	fmaddsub213ps/pd	fmaddsub231ps/pd
		7	fmsubadd132ps/pd	fmsubadd213ps/pd	fmsubadd231ps/pd
		8	fmadd132ps/pd		fmadd213ps/pd		fmadd231ps/pd
		9	fmadd132ss/sd		fmadd213ss/sd		fmadd231ss/sd
		a	fmsub132ps/pd		fmsub213ps/pd		fmsub231ps/pd
		b	fmsub132ss/sd		fmsub213ss/sd		fmsub231ss/sd
		c	fnmadd132ps/pd		fnmadd213ps/pd		fnmadd231ps/pd
		d	fnmadd132ss/sd		fnmadd213ss/sd		fnmadd231ss/sd
		e	fnmsub132ps/pd		fnmsub213ps/pd		fnmsub231ps/pd
		f	fnmsub132ss/sd		fnmsub213ss/sd		fnmsub231ss/sd

It would be possible to have functions something similar to these below:
(I did not check there that the 'Opcode' is FMA3.

		// If this implementation is not safe, then we could have the same binary search through static arrays with FMA opcodes.
		bool isFMA3(uint8_t Opcode) { 
		<Do some additional checks here to avoid possible(?) overlapping
		with other/future opcodes having the same opcode byte, but different prefixes/attributes.
		uint8_t LowB = Opcode && 0xf;
		uint8_t HighB = Opcode >> 8;
		return HighB >= 0x9 && HighB <= 0xb &&
				LowB >= 0x6 /* && LowB <= 0xf*/;
		}

		bool isFMA3Form132(uint8_t Opcode) { return (Opcode >> 8) == 0x9; }
		bool isFMA3Form213(uint8_t Opcode) { return (Opcode >> 8) == 0xa; }
		bool isFMA3Form231(uint8_t Opcode) { return (Opcode >> 8) == 0xb; }
	
		bool isFMA3Scalar(uint8_t Opcode) {
		uint8_t B = Opcode & 0x0f;
		return B == 0x9 || B == 0xb || B == 0xd || B == 0xf;
		}

		bool isFMA3_FMADD(uint8_t Opcode) {
		uint8_t B = Opcode & 0xf;
		return B == 0x8 || B == 0x9;
		}
		bool isFMA3_FMSUB(uint8_t Opcode) {
		uint8_t B = Opcode & 0xf;
		return B == 0xa || B == 0xb;
		}
		bool isFMA3_FNMADD(uint8_t Opcode) {
		uint8_t B = Opcode & 0xf;
		return B == 0xc || B == 0xd;
		}
		bool isFMA3_FNMSUB(uint8_t Opcode) {
		uint8_t B = Opcode & 0xf;
		return B == 0xe || B == 0xf;
		}
		bool isFMA3_FMADDSUB(uint8_t Opcode) { return Opcode & 0xf) == 0x6; }
		bool isFMA3_FMSUBADD(uint8_t Opcode) { return Opcode & 0xf) == 0x7; }

With such solution you would not need to add those 2 bits to TSFlags for 132/213/231 forms.

We also would have the methods that can distinguish 132 vs 213 vs 231
and FMA vs FMS vs FNMA vs NMMS vs FMADDSUB vs FMSUBADD.

Unfortunately, I still would have quite long linear search for methods like this:

		unsigned getFMA213Opcode(bool IsEVEX, MVT VT) {
		// !!! Linear search through about 300-400 opcodes in FMA3RegOpcodes table.
		Opc = <iterate through entries of FMA3RegOpcodes table>;
		uint8_t BaseOpcode = ...getBaseOpcodeFor(TSFlags);
		
		if (isFMA3_FMADD(BaseOpcode) &&
			<check HasEVEX and VT here>)
		  return Opc;

There are at least 2 ways how to workaround that:

Reduce the number of entries in FMA3RegOpcodes by having 1 static array of structures like I mentioned in the previous comment (i.e. describing bigger FMA families/groups, where 1 group would include {VEX}x{Reg,Mem} + {EVEX}x{Reg,Mem,KReg,KMem,KZReg,KZMem,Broadcast,KBroadcast,KZBroadcast,Round,KRound,KZRound}.

Do some sort of hashing in my new FMA optimization:

		// Just the idea.
		unsigned FMAOpc = <iterate through FMA3RegOpcodes>;
		uint8_t FMAOpcByte = <extract opcode byte from TSFlags>;
		if (isFMA3FMADD(FMAOpcByte))
			FMADDOperations[FMADDOperationsIndex++] = <current index in FMA3RegOpcodes>;
		else if (isFMA3FMSUB(FMAOpcByte))
			FMADDOperations[FMSUBOperationsIndex++] = <current index in FMA3RegOpcodes>;
		else if (isFMA3FMSUB(FMAOpcByte))
			FMADDOperations[FMSUBOperationsIndex++] = <current index in FMA3RegOpcodes>;
		...

Even with (2), I think having bigger FMA3 groups may be useful in future.
For example, it would be convenient if/when need to add support for broadcast+operation folding at Peephole opt:
t1 = vbroadcastss <mem>
t2 = VFMADD213PSZr a, b, t1;
-->
t2 = VFMADD213PSZmb a, b, [broadcast <mem>]
Bigger groups would just easily return VFMADD213PSZmb for VFMADD213PSZr.

Thank you,
Slava
`

v_klochkov edited edge metadata.Aug 30 2016, 3:37 PM

v_klochkov added subscribers: DavidKreitzer, qcolombet.

@v_klochkov @craig.topper It's been almost 6 months - is anything happening with this?

craig.topper abandoned this revision.Feb 23 2017, 9:32 PM

Revision Contents

Path

Size

lib/

Target/

X86/

CMakeLists.txt

1 line

MCTargetDesc/

47 lines

22 lines

23 lines

478 lines

17 lines

15 lines

89 lines

Diff 69316

lib/Target/X86/CMakeLists.txt

Show All 18 Lines	set(sources
X86FastISel.cpp		X86FastISel.cpp
X86FixupBWInsts.cpp		X86FixupBWInsts.cpp
X86FixupLEAs.cpp		X86FixupLEAs.cpp
X86FixupSetCC.cpp		X86FixupSetCC.cpp
X86FloatingPoint.cpp		X86FloatingPoint.cpp
X86FrameLowering.cpp		X86FrameLowering.cpp
X86ISelDAGToDAG.cpp		X86ISelDAGToDAG.cpp
X86ISelLowering.cpp		X86ISelLowering.cpp
X86InstrFMA3Info.cpp
X86InstrInfo.cpp		X86InstrInfo.cpp
X86MCInstLower.cpp		X86MCInstLower.cpp
X86MachineFunctionInfo.cpp		X86MachineFunctionInfo.cpp
X86OptimizeLEAs.cpp		X86OptimizeLEAs.cpp
X86PadShortFunction.cpp		X86PadShortFunction.cpp
X86RegisterInfo.cpp		X86RegisterInfo.cpp
X86SelectionDAGInfo.cpp		X86SelectionDAGInfo.cpp
X86ShuffleDecodeConstantPool.cpp		X86ShuffleDecodeConstantPool.cpp
Show All 17 Lines

lib/Target/X86/MCTargetDesc/X86BaseInfo.h

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	namespace X86 {
/// avx512fintrin.h.		/// avx512fintrin.h.
enum STATIC_ROUNDING {		enum STATIC_ROUNDING {
TO_NEAREST_INT = 0,		TO_NEAREST_INT = 0,
TO_NEG_INF = 1,		TO_NEG_INF = 1,
TO_POS_INF = 2,		TO_POS_INF = 2,
TO_ZERO = 3,		TO_ZERO = 3,
CUR_DIRECTION = 4		CUR_DIRECTION = 4
};		};

		/// FMA3 form constants.
		enum {
		FMA3Form132 = 0,
		FMA3Form213 = 1,
		FMA3Form231 = 2,
		};
} // end namespace X86;		} // end namespace X86;

/// X86II - This namespace holds all of the target specific flags that		/// X86II - This namespace holds all of the target specific flags that
/// instruction info tracks.		/// instruction info tracks.
///		///
namespace X86II {		namespace X86II {
/// Target Operand Flag enum.		/// Target Operand Flag enum.
enum TOF {		enum TOF {
▲ Show 20 Lines • Show All 486 Lines • ▼ Show 20 Lines	enum : uint64_t {
/// storing a classifier in the imm8 field. To simplify our implementation,		/// storing a classifier in the imm8 field. To simplify our implementation,
/// we handle this by storeing the classifier in the opcode field and using		/// we handle this by storeing the classifier in the opcode field and using
/// this flag to indicate that the encoder should do the wacky 3DNow! thing.		/// this flag to indicate that the encoder should do the wacky 3DNow! thing.
Has3DNow0F0FOpcodeShift = CD8_Scale_Shift + 7,		Has3DNow0F0FOpcodeShift = CD8_Scale_Shift + 7,
Has3DNow0F0FOpcode = 1ULL << Has3DNow0F0FOpcodeShift,		Has3DNow0F0FOpcode = 1ULL << Has3DNow0F0FOpcodeShift,

/// Explicitly specified rounding control		/// Explicitly specified rounding control
EVEX_RCShift = Has3DNow0F0FOpcodeShift + 1,		EVEX_RCShift = Has3DNow0F0FOpcodeShift + 1,
EVEX_RC = 1ULL << EVEX_RCShift		EVEX_RC = 1ULL << EVEX_RCShift,

		/// FMA3Form - If this an FMA3 instruction indicates whether this is the
		/// 132, 213, or 231 form. 0 means non FMA instruction.
		FMA3FormShift = EVEX_RCShift + 1,
		FMA3FormMask = 3ULL << FMA3FormShift,
		FMA3_132 = (uint64_t)(X86::FMA3Form132 + 1) << FMA3FormShift,
		FMA3_213 = (uint64_t)(X86::FMA3Form213 + 1) << FMA3FormShift,
		FMA3_231 = (uint64_t)(X86::FMA3Form231 + 1) << FMA3FormShift,

		/// FMA3Intrinsic - Indicates if this an FMA3 scalar intrinsic instruction.
		FMA3IntrinsicShift = FMA3FormShift + 2,
		FMA3IntrinsicMask = 1ULL << FMA3IntrinsicShift,
};		};

		/// isFMA3 - Is this an FMA3 instruction.
		inline bool isFMA3(uint64_t TSFlags) {
		return (TSFlags & X86II::FMA3FormMask) != 0;
		}

		/// getFMA3Form - Returns whether this a 132, 213, or 231 FMA3 form.
		inline unsigned getFMA3Form(uint64_t TSFlags) {
		assert(isFMA3(TSFlags) && "Not an FMA3 instruction?");
		return ((TSFlags & X86II::FMA3FormMask) >> FMA3FormShift) - 1;
		}

		/// isFMA3Intrinsics - Is this an FMA3 scalar intrinsic instruction.
		inline bool isFMA3Intrinsic(uint64_t TSFlags) {
		return (TSFlags & X86II::FMA3IntrinsicMask) != 0;
		}

		/// isKMasked - Is this a masked instruction.
		inline bool isKMasked(uint64_t TSFlags) {
		return (TSFlags & X86II::EVEX_K) != 0;
		}

		/// isKMergedMasked - Is this a merge masked instruction.
		inline bool isKMergeMasked(uint64_t TSFlags) {
		return isKMasked(TSFlags) && (TSFlags & X86II::EVEX_Z) == 0;
		}

// getBaseOpcodeFor - This function returns the "base" X86 opcode for the		// getBaseOpcodeFor - This function returns the "base" X86 opcode for the
// specified machine instruction.		// specified machine instruction.
//		//
inline unsigned char getBaseOpcodeFor(uint64_t TSFlags) {		inline unsigned char getBaseOpcodeFor(uint64_t TSFlags) {
return TSFlags >> X86II::OpcodeShift;		return TSFlags >> X86II::OpcodeShift;
}		}

inline bool hasImm(uint64_t TSFlags) {		inline bool hasImm(uint64_t TSFlags) {
▲ Show 20 Lines • Show All 209 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrAVX512.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,933 Lines • ▼ Show 20 Lines	let Predicates = [HasVLX, HasAVX512] in {
defm Z128 : avx512_fma3p_213_rm<opc, OpcodeStr, OpNode, _.info128, Suff>,		defm Z128 : avx512_fma3p_213_rm<opc, OpcodeStr, OpNode, _.info128, Suff>,
EVEX_V128, EVEX_CD8<_.info128.EltSize, CD8VF>;		EVEX_V128, EVEX_CD8<_.info128.EltSize, CD8VF>;
}		}
}		}

multiclass avx512_fma3p_213_f<bits<8> opc, string OpcodeStr, SDNode OpNode,		multiclass avx512_fma3p_213_f<bits<8> opc, string OpcodeStr, SDNode OpNode,
SDNode OpNodeRnd > {		SDNode OpNodeRnd > {
defm PS : avx512_fma3p_213_common<opc, OpcodeStr#"ps", OpNode, OpNodeRnd,		defm PS : avx512_fma3p_213_common<opc, OpcodeStr#"ps", OpNode, OpNodeRnd,
avx512vl_f32_info, "PS">;		avx512vl_f32_info, "PS">, FMA3_213;
defm PD : avx512_fma3p_213_common<opc, OpcodeStr#"pd", OpNode, OpNodeRnd,		defm PD : avx512_fma3p_213_common<opc, OpcodeStr#"pd", OpNode, OpNodeRnd,
avx512vl_f64_info, "PD">, VEX_W;		avx512vl_f64_info, "PD">, VEX_W, FMA3_213;
}		}

defm VFMADD213 : avx512_fma3p_213_f<0xA8, "vfmadd213", X86Fmadd, X86FmaddRnd>;		defm VFMADD213 : avx512_fma3p_213_f<0xA8, "vfmadd213", X86Fmadd, X86FmaddRnd>;
defm VFMSUB213 : avx512_fma3p_213_f<0xAA, "vfmsub213", X86Fmsub, X86FmsubRnd>;		defm VFMSUB213 : avx512_fma3p_213_f<0xAA, "vfmsub213", X86Fmsub, X86FmsubRnd>;
defm VFMADDSUB213 : avx512_fma3p_213_f<0xA6, "vfmaddsub213", X86Fmaddsub, X86FmaddsubRnd>;		defm VFMADDSUB213 : avx512_fma3p_213_f<0xA6, "vfmaddsub213", X86Fmaddsub, X86FmaddsubRnd>;
defm VFMSUBADD213 : avx512_fma3p_213_f<0xA7, "vfmsubadd213", X86Fmsubadd, X86FmsubaddRnd>;		defm VFMSUBADD213 : avx512_fma3p_213_f<0xA7, "vfmsubadd213", X86Fmsubadd, X86FmsubaddRnd>;
defm VFNMADD213 : avx512_fma3p_213_f<0xAC, "vfnmadd213", X86Fnmadd, X86FnmaddRnd>;		defm VFNMADD213 : avx512_fma3p_213_f<0xAC, "vfnmadd213", X86Fnmadd, X86FnmaddRnd>;
defm VFNMSUB213 : avx512_fma3p_213_f<0xAE, "vfnmsub213", X86Fnmsub, X86FnmsubRnd>;		defm VFNMSUB213 : avx512_fma3p_213_f<0xAE, "vfnmsub213", X86Fnmsub, X86FnmsubRnd>;
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	let Predicates = [HasVLX, HasAVX512] in {
defm Z128 : avx512_fma3p_231_rm<opc, OpcodeStr, OpNode, _.info128, Suff>,		defm Z128 : avx512_fma3p_231_rm<opc, OpcodeStr, OpNode, _.info128, Suff>,
EVEX_V128, EVEX_CD8<_.info128.EltSize, CD8VF>;		EVEX_V128, EVEX_CD8<_.info128.EltSize, CD8VF>;
}		}
}		}

multiclass avx512_fma3p_231_f<bits<8> opc, string OpcodeStr, SDNode OpNode,		multiclass avx512_fma3p_231_f<bits<8> opc, string OpcodeStr, SDNode OpNode,
SDNode OpNodeRnd > {		SDNode OpNodeRnd > {
defm PS : avx512_fma3p_231_common<opc, OpcodeStr#"ps", OpNode, OpNodeRnd,		defm PS : avx512_fma3p_231_common<opc, OpcodeStr#"ps", OpNode, OpNodeRnd,
avx512vl_f32_info, "PS">;		avx512vl_f32_info, "PS">, FMA3_231;
defm PD : avx512_fma3p_231_common<opc, OpcodeStr#"pd", OpNode, OpNodeRnd,		defm PD : avx512_fma3p_231_common<opc, OpcodeStr#"pd", OpNode, OpNodeRnd,
avx512vl_f64_info, "PD">, VEX_W;		avx512vl_f64_info, "PD">, VEX_W, FMA3_231;
}		}

defm VFMADD231 : avx512_fma3p_231_f<0xB8, "vfmadd231", X86Fmadd, X86FmaddRnd>;		defm VFMADD231 : avx512_fma3p_231_f<0xB8, "vfmadd231", X86Fmadd, X86FmaddRnd>;
defm VFMSUB231 : avx512_fma3p_231_f<0xBA, "vfmsub231", X86Fmsub, X86FmsubRnd>;		defm VFMSUB231 : avx512_fma3p_231_f<0xBA, "vfmsub231", X86Fmsub, X86FmsubRnd>;
defm VFMADDSUB231 : avx512_fma3p_231_f<0xB6, "vfmaddsub231", X86Fmaddsub, X86FmaddsubRnd>;		defm VFMADDSUB231 : avx512_fma3p_231_f<0xB6, "vfmaddsub231", X86Fmaddsub, X86FmaddsubRnd>;
defm VFMSUBADD231 : avx512_fma3p_231_f<0xB7, "vfmsubadd231", X86Fmsubadd, X86FmsubaddRnd>;		defm VFMSUBADD231 : avx512_fma3p_231_f<0xB7, "vfmsubadd231", X86Fmsubadd, X86FmsubaddRnd>;
defm VFNMADD231 : avx512_fma3p_231_f<0xBC, "vfnmadd231", X86Fnmadd, X86FnmaddRnd>;		defm VFNMADD231 : avx512_fma3p_231_f<0xBC, "vfnmadd231", X86Fnmadd, X86FnmaddRnd>;
defm VFNMSUB231 : avx512_fma3p_231_f<0xBE, "vfnmsub231", X86Fnmsub, X86FnmsubRnd>;		defm VFNMSUB231 : avx512_fma3p_231_f<0xBE, "vfnmsub231", X86Fnmsub, X86FnmsubRnd>;
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	let Predicates = [HasVLX, HasAVX512] in {
defm Z128 : avx512_fma3p_132_rm<opc, OpcodeStr, OpNode, _.info128, Suff>,		defm Z128 : avx512_fma3p_132_rm<opc, OpcodeStr, OpNode, _.info128, Suff>,
EVEX_V128, EVEX_CD8<_.info128.EltSize, CD8VF>;		EVEX_V128, EVEX_CD8<_.info128.EltSize, CD8VF>;
}		}
}		}

multiclass avx512_fma3p_132_f<bits<8> opc, string OpcodeStr, SDNode OpNode,		multiclass avx512_fma3p_132_f<bits<8> opc, string OpcodeStr, SDNode OpNode,
SDNode OpNodeRnd > {		SDNode OpNodeRnd > {
defm PS : avx512_fma3p_132_common<opc, OpcodeStr#"ps", OpNode, OpNodeRnd,		defm PS : avx512_fma3p_132_common<opc, OpcodeStr#"ps", OpNode, OpNodeRnd,
avx512vl_f32_info, "PS">;		avx512vl_f32_info, "PS">, FMA3_132;
defm PD : avx512_fma3p_132_common<opc, OpcodeStr#"pd", OpNode, OpNodeRnd,		defm PD : avx512_fma3p_132_common<opc, OpcodeStr#"pd", OpNode, OpNodeRnd,
avx512vl_f64_info, "PD">, VEX_W;		avx512vl_f64_info, "PD">, VEX_W, FMA3_132;
}		}

defm VFMADD132 : avx512_fma3p_132_f<0x98, "vfmadd132", X86Fmadd, X86FmaddRnd>;		defm VFMADD132 : avx512_fma3p_132_f<0x98, "vfmadd132", X86Fmadd, X86FmaddRnd>;
defm VFMSUB132 : avx512_fma3p_132_f<0x9A, "vfmsub132", X86Fmsub, X86FmsubRnd>;		defm VFMSUB132 : avx512_fma3p_132_f<0x9A, "vfmsub132", X86Fmsub, X86FmsubRnd>;
defm VFMADDSUB132 : avx512_fma3p_132_f<0x96, "vfmaddsub132", X86Fmaddsub, X86FmaddsubRnd>;		defm VFMADDSUB132 : avx512_fma3p_132_f<0x96, "vfmaddsub132", X86Fmaddsub, X86FmaddsubRnd>;
defm VFMSUBADD132 : avx512_fma3p_132_f<0x97, "vfmsubadd132", X86Fmsubadd, X86FmsubaddRnd>;		defm VFMSUBADD132 : avx512_fma3p_132_f<0x97, "vfmsubadd132", X86Fmsubadd, X86FmsubaddRnd>;
defm VFNMADD132 : avx512_fma3p_132_f<0x9C, "vfnmadd132", X86Fnmadd, X86FnmaddRnd>;		defm VFNMADD132 : avx512_fma3p_132_f<0x9C, "vfnmadd132", X86Fnmadd, X86FnmaddRnd>;
defm VFNMSUB132 : avx512_fma3p_132_f<0x9E, "vfnmsub132", X86Fnmsub, X86FnmsubRnd>;		defm VFNMSUB132 : avx512_fma3p_132_f<0x9E, "vfnmsub132", X86Fnmsub, X86FnmsubRnd>;

// Scalar FMA		// Scalar FMA
let Constraints = "$src1 = $dst" in {		let Constraints = "$src1 = $dst" in {
multiclass avx512_fma3s_common<bits<8> opc, string OpcodeStr, X86VectorVTInfo _,		multiclass avx512_fma3s_common<bits<8> opc, string OpcodeStr, X86VectorVTInfo _,
dag RHS_VEC_r, dag RHS_VEC_m, dag RHS_VEC_rb,		dag RHS_VEC_r, dag RHS_VEC_m, dag RHS_VEC_rb,
dag RHS_r, dag RHS_m > {		dag RHS_r, dag RHS_m > {
		let FMA3Intrinsic = 1 in {
defm r_Int: AVX512_maskable_3src_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),		defm r_Int: AVX512_maskable_3src_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),
(ins _.RC:$src2, _.RC:$src3), OpcodeStr,		(ins _.RC:$src2, _.RC:$src3), OpcodeStr,
"$src3, $src2", "$src2, $src3", RHS_VEC_r, 1, 1>, AVX512FMA3Base;		"$src3, $src2", "$src2, $src3", RHS_VEC_r, 1, 1>, AVX512FMA3Base;

defm m_Int: AVX512_maskable_3src_scalar<opc, MRMSrcMem, _, (outs _.RC:$dst),		defm m_Int: AVX512_maskable_3src_scalar<opc, MRMSrcMem, _, (outs _.RC:$dst),
(ins _.RC:$src2, _.ScalarMemOp:$src3), OpcodeStr,		(ins _.RC:$src2, _.ScalarMemOp:$src3), OpcodeStr,
"$src3, $src2", "$src2, $src3", RHS_VEC_m, 1, 1>, AVX512FMA3Base;		"$src3, $src2", "$src2, $src3", RHS_VEC_m, 1, 1>, AVX512FMA3Base;

defm rb_Int: AVX512_maskable_3src_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),		defm rb_Int: AVX512_maskable_3src_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),
(ins _.RC:$src2, _.RC:$src3, AVX512RC:$rc),		(ins _.RC:$src2, _.RC:$src3, AVX512RC:$rc),
OpcodeStr, "$rc, $src3, $src2", "$src2, $src3, $rc", RHS_VEC_rb, 1, 1>,		OpcodeStr, "$rc, $src3, $src2", "$src2, $src3, $rc", RHS_VEC_rb, 1, 1>,
AVX512FMA3Base, EVEX_B, EVEX_RC;		AVX512FMA3Base, EVEX_B, EVEX_RC;
		}

let isCodeGenOnly = 1, isCommutable = 1 in {		let isCodeGenOnly = 1, isCommutable = 1 in {
def r : AVX512FMA3<opc, MRMSrcReg, (outs _.FRC:$dst),		def r : AVX512FMA3<opc, MRMSrcReg, (outs _.FRC:$dst),
(ins _.FRC:$src1, _.FRC:$src2, _.FRC:$src3),		(ins _.FRC:$src1, _.FRC:$src2, _.FRC:$src3),
!strconcat(OpcodeStr,		!strconcat(OpcodeStr,
"\t{$src3, $src2, $dst\|$dst, $src2, $src3}"),		"\t{$src3, $src2, $dst\|$dst, $src2, $src3}"),
[RHS_r]>;		[RHS_r]>;
def m : AVX512FMA3<opc, MRMSrcMem, (outs _.FRC:$dst),		def m : AVX512FMA3<opc, MRMSrcMem, (outs _.FRC:$dst),
Show All 13 Lines	defm NAME#213#SUFF#Z: avx512_fma3s_common<opc213, OpcodeStr#"213"#_.Suffix , _ ,
(_.VT (OpNodeRnd _.RC:$src2, _.RC:$src1, _.RC:$src3, (i32 FROUND_CURRENT))),		(_.VT (OpNodeRnd _.RC:$src2, _.RC:$src1, _.RC:$src3, (i32 FROUND_CURRENT))),
(_.VT (OpNodeRnd _.RC:$src2, _.RC:$src1,		(_.VT (OpNodeRnd _.RC:$src2, _.RC:$src1,
(_.VT (scalar_to_vector(_.ScalarLdFrag addr:$src3))), (i32 FROUND_CURRENT))),		(_.VT (scalar_to_vector(_.ScalarLdFrag addr:$src3))), (i32 FROUND_CURRENT))),
(_.VT ( OpNodeRnd _.RC:$src2, _.RC:$src1, _.RC:$src3,		(_.VT ( OpNodeRnd _.RC:$src2, _.RC:$src1, _.RC:$src3,
(i32 imm:$rc))),		(i32 imm:$rc))),
(set _.FRC:$dst, (_.EltVT (OpNode _.FRC:$src2, _.FRC:$src1,		(set _.FRC:$dst, (_.EltVT (OpNode _.FRC:$src2, _.FRC:$src1,
_.FRC:$src3))),		_.FRC:$src3))),
(set _.FRC:$dst, (_.EltVT (OpNode _.FRC:$src2, _.FRC:$src1,		(set _.FRC:$dst, (_.EltVT (OpNode _.FRC:$src2, _.FRC:$src1,
(_.ScalarLdFrag addr:$src3))))>;		(_.ScalarLdFrag addr:$src3))))>, FMA3_213;

defm NAME#231#SUFF#Z: avx512_fma3s_common<opc231, OpcodeStr#"231"#_.Suffix , _ ,		defm NAME#231#SUFF#Z: avx512_fma3s_common<opc231, OpcodeStr#"231"#_.Suffix , _ ,
(_.VT (OpNodeRnd _.RC:$src2, _.RC:$src3, _.RC:$src1, (i32 FROUND_CURRENT))),		(_.VT (OpNodeRnd _.RC:$src2, _.RC:$src3, _.RC:$src1, (i32 FROUND_CURRENT))),
(_.VT (OpNodeRnd _.RC:$src2,		(_.VT (OpNodeRnd _.RC:$src2,
(_.VT (scalar_to_vector(_.ScalarLdFrag addr:$src3))),		(_.VT (scalar_to_vector(_.ScalarLdFrag addr:$src3))),
_.RC:$src1, (i32 FROUND_CURRENT))),		_.RC:$src1, (i32 FROUND_CURRENT))),
(_.VT ( OpNodeRnd _.RC:$src2, _.RC:$src3, _.RC:$src1,		(_.VT ( OpNodeRnd _.RC:$src2, _.RC:$src3, _.RC:$src1,
(i32 imm:$rc))),		(i32 imm:$rc))),
(set _.FRC:$dst, (_.EltVT (OpNode _.FRC:$src2, _.FRC:$src3,		(set _.FRC:$dst, (_.EltVT (OpNode _.FRC:$src2, _.FRC:$src3,
_.FRC:$src1))),		_.FRC:$src1))),
(set _.FRC:$dst, (_.EltVT (OpNode _.FRC:$src2,		(set _.FRC:$dst, (_.EltVT (OpNode _.FRC:$src2,
(_.ScalarLdFrag addr:$src3), _.FRC:$src1)))>;		(_.ScalarLdFrag addr:$src3), _.FRC:$src1)))>,
		FMA3_231;

defm NAME#132#SUFF#Z: avx512_fma3s_common<opc132, OpcodeStr#"132"#_.Suffix , _ ,		defm NAME#132#SUFF#Z: avx512_fma3s_common<opc132, OpcodeStr#"132"#_.Suffix , _ ,
(_.VT (OpNodeRnd _.RC:$src1, _.RC:$src3, _.RC:$src2, (i32 FROUND_CURRENT))),		(_.VT (OpNodeRnd _.RC:$src1, _.RC:$src3, _.RC:$src2, (i32 FROUND_CURRENT))),
(_.VT (OpNodeRnd _.RC:$src1,		(_.VT (OpNodeRnd _.RC:$src1,
(_.VT (scalar_to_vector(_.ScalarLdFrag addr:$src3))),		(_.VT (scalar_to_vector(_.ScalarLdFrag addr:$src3))),
_.RC:$src2, (i32 FROUND_CURRENT))),		_.RC:$src2, (i32 FROUND_CURRENT))),
(_.VT ( OpNodeRnd _.RC:$src1, _.RC:$src3, _.RC:$src2,		(_.VT ( OpNodeRnd _.RC:$src1, _.RC:$src3, _.RC:$src2,
(i32 imm:$rc))),		(i32 imm:$rc))),
(set _.FRC:$dst, (_.EltVT (OpNode _.FRC:$src1, _.FRC:$src3,		(set _.FRC:$dst, (_.EltVT (OpNode _.FRC:$src1, _.FRC:$src3,
_.FRC:$src2))),		_.FRC:$src2))),
(set _.FRC:$dst, (_.EltVT (OpNode _.FRC:$src1,		(set _.FRC:$dst, (_.EltVT (OpNode _.FRC:$src1,
(_.ScalarLdFrag addr:$src3), _.FRC:$src2)))>;		(_.ScalarLdFrag addr:$src3), _.FRC:$src2)))>,
		FMA3_132;
}		}

multiclass avx512_fma3s<bits<8> opc213, bits<8> opc231, bits<8> opc132,		multiclass avx512_fma3s<bits<8> opc213, bits<8> opc231, bits<8> opc132,
string OpcodeStr, SDNode OpNode, SDNode OpNodeRnd>{		string OpcodeStr, SDNode OpNode, SDNode OpNodeRnd>{
let Predicates = [HasAVX512] in {		let Predicates = [HasAVX512] in {
defm NAME : avx512_fma3s_all<opc213, opc231, opc132, OpcodeStr, OpNode,		defm NAME : avx512_fma3s_all<opc213, opc231, opc132, OpcodeStr, OpNode,
OpNodeRnd, f32x_info, "SS">,		OpNodeRnd, f32x_info, "SS">,
EVEX_CD8<32, CD8VT1>, VEX_LIG;		EVEX_CD8<32, CD8VT1>, VEX_LIG;
▲ Show 20 Lines • Show All 3,065 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrFMA.td

	Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	}			}

	multiclass fma3p_forms<bits<8> opc132, bits<8> opc213, bits<8> opc231,			multiclass fma3p_forms<bits<8> opc132, bits<8> opc213, bits<8> opc231,
	string OpcodeStr, string PackTy, string Suff,			string OpcodeStr, string PackTy, string Suff,
	PatFrag MemFrag128, PatFrag MemFrag256,			PatFrag MemFrag128, PatFrag MemFrag256,
	SDNode Op, ValueType OpTy128, ValueType OpTy256> {			SDNode Op, ValueType OpTy128, ValueType OpTy256> {
	defm NAME#213#Suff : fma3p_rm<opc213,			defm NAME#213#Suff : fma3p_rm<opc213,
	!strconcat(OpcodeStr, "213", PackTy),			!strconcat(OpcodeStr, "213", PackTy),
	MemFrag128, MemFrag256, OpTy128, OpTy256, Op>;			MemFrag128, MemFrag256, OpTy128, OpTy256, Op>,
				FMA3_213;
	defm NAME#132#Suff : fma3p_rm<opc132,			defm NAME#132#Suff : fma3p_rm<opc132,
	!strconcat(OpcodeStr, "132", PackTy),			!strconcat(OpcodeStr, "132", PackTy),
	MemFrag128, MemFrag256, OpTy128, OpTy256>;			MemFrag128, MemFrag256, OpTy128, OpTy256>,
				FMA3_132;
	defm NAME#231#Suff : fma3p_rm<opc231,			defm NAME#231#Suff : fma3p_rm<opc231,
	!strconcat(OpcodeStr, "231", PackTy),			!strconcat(OpcodeStr, "231", PackTy),
	MemFrag128, MemFrag256, OpTy128, OpTy256>;			MemFrag128, MemFrag256, OpTy128, OpTy256>,
				FMA3_231;
	}			}

	// Fused Multiply-Add			// Fused Multiply-Add
	let ExeDomain = SSEPackedSingle in {			let ExeDomain = SSEPackedSingle in {
	defm VFMADD : fma3p_forms<0x98, 0xA8, 0xB8, "vfmadd", "ps", "PS",			defm VFMADD : fma3p_forms<0x98, 0xA8, 0xB8, "vfmadd", "ps", "PS",
	loadv4f32, loadv8f32, X86Fmadd, v4f32, v8f32>;			loadv4f32, loadv8f32, X86Fmadd, v4f32, v8f32>;
	defm VFMSUB : fma3p_forms<0x9A, 0xAA, 0xBA, "vfmsub", "ps", "PS",			defm VFMSUB : fma3p_forms<0x9A, 0xAA, 0xBA, "vfmsub", "ps", "PS",
	loadv4f32, loadv8f32, X86Fmsub, v4f32, v8f32>;			loadv4f32, loadv8f32, X86Fmsub, v4f32, v8f32>;
	▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	// Commuting the 2nd and 3rd source register operands of FMAs is quite trivial			// Commuting the 2nd and 3rd source register operands of FMAs is quite trivial
	// and the corresponding optimizations have been developed.			// and the corresponding optimizations have been developed.
	// Commuting the 1st operand of FMA*_Int requires some additional analysis,			// Commuting the 1st operand of FMA*_Int requires some additional analysis,
	// the commute optimization is legal only if all users of FMA*_Int use only			// the commute optimization is legal only if all users of FMA*_Int use only
	// the lowest element of the FMA*_Int instruction. Even though such analysis			// the lowest element of the FMA*_Int instruction. Even though such analysis
	// may be not implemented yet we allow the routines doing the actual commute			// may be not implemented yet we allow the routines doing the actual commute
	// transformation to decide if one or another instruction is commutable or not.			// transformation to decide if one or another instruction is commutable or not.
	let Constraints = "$src1 = $dst", isCommutable = 1, isCodeGenOnly = 1,			let Constraints = "$src1 = $dst", isCommutable = 1, isCodeGenOnly = 1,
	hasSideEffects = 0 in			hasSideEffects = 0, FMA3Intrinsic = 1 in
	multiclass fma3s_rm_int<bits<8> opc, string OpcodeStr,			multiclass fma3s_rm_int<bits<8> opc, string OpcodeStr,
	Operand memopr, RegisterClass RC> {			Operand memopr, RegisterClass RC> {
	def r_Int : FMA3<opc, MRMSrcReg, (outs RC:$dst),			def r_Int : FMA3<opc, MRMSrcReg, (outs RC:$dst),
	(ins RC:$src1, RC:$src2, RC:$src3),			(ins RC:$src1, RC:$src2, RC:$src3),
	!strconcat(OpcodeStr,			!strconcat(OpcodeStr,
	"\t{$src3, $src2, $dst\|$dst, $src2, $src3}"),			"\t{$src3, $src2, $dst\|$dst, $src2, $src3}"),
	[]>;			[]>;

	let mayLoad = 1 in			let mayLoad = 1 in
	def m_Int : FMA3<opc, MRMSrcMem, (outs RC:$dst),			def m_Int : FMA3<opc, MRMSrcMem, (outs RC:$dst),
	(ins RC:$src1, RC:$src2, memopr:$src3),			(ins RC:$src1, RC:$src2, memopr:$src3),
	!strconcat(OpcodeStr,			!strconcat(OpcodeStr,
	"\t{$src3, $src2, $dst\|$dst, $src2, $src3}"),			"\t{$src3, $src2, $dst\|$dst, $src2, $src3}"),
	[]>;			[]>;
	}			}

	multiclass fma3s_forms<bits<8> opc132, bits<8> opc213, bits<8> opc231,			multiclass fma3s_forms<bits<8> opc132, bits<8> opc213, bits<8> opc231,
	string OpStr, string PackTy, string Suff,			string OpStr, string PackTy, string Suff,
	SDNode OpNode, RegisterClass RC,			SDNode OpNode, RegisterClass RC,
	X86MemOperand x86memop> {			X86MemOperand x86memop> {
	defm NAME#132#Suff : fma3s_rm<opc132, !strconcat(OpStr, "132", PackTy),			defm NAME#132#Suff : fma3s_rm<opc132, !strconcat(OpStr, "132", PackTy),
	x86memop, RC>;			x86memop, RC>, FMA3_132;
	defm NAME#213#Suff : fma3s_rm<opc213, !strconcat(OpStr, "213", PackTy),			defm NAME#213#Suff : fma3s_rm<opc213, !strconcat(OpStr, "213", PackTy),
	x86memop, RC, OpNode>;			x86memop, RC, OpNode>, FMA3_213;
	defm NAME#231#Suff : fma3s_rm<opc231, !strconcat(OpStr, "231", PackTy),			defm NAME#231#Suff : fma3s_rm<opc231, !strconcat(OpStr, "231", PackTy),
	x86memop, RC>;			x86memop, RC>, FMA3_231;
	}			}

	// The FMA 213 form is created for lowering of scalar FMA intrinscis			// The FMA 213 form is created for lowering of scalar FMA intrinscis
	// to machine instructions.			// to machine instructions.
	// The FMA 132 form can trivially be get by commuting the 2nd and 3rd operands			// The FMA 132 form can trivially be get by commuting the 2nd and 3rd operands
	// of FMA 213 form.			// of FMA 213 form.
	// The FMA 231 form can be get only by commuting the 1st operand of 213 or 132			// The FMA 231 form can be get only by commuting the 1st operand of 213 or 132
	// forms and is possible only after special analysis of all uses of the initial			// forms and is possible only after special analysis of all uses of the initial
	// instruction. Such analysis do not exist yet and thus introducing the 231			// instruction. Such analysis do not exist yet and thus introducing the 231
	// form of FMA*_Int instructions is done using an optimistic assumption that			// form of FMA*_Int instructions is done using an optimistic assumption that
	// such analysis will be implemented eventually.			// such analysis will be implemented eventually.
	multiclass fma3s_int_forms<bits<8> opc132, bits<8> opc213, bits<8> opc231,			multiclass fma3s_int_forms<bits<8> opc132, bits<8> opc213, bits<8> opc231,
	string OpStr, string PackTy, string Suff,			string OpStr, string PackTy, string Suff,
	RegisterClass RC, Operand memop> {			RegisterClass RC, Operand memop> {
	defm NAME#132#Suff : fma3s_rm_int<opc132, !strconcat(OpStr, "132", PackTy),			defm NAME#132#Suff : fma3s_rm_int<opc132, !strconcat(OpStr, "132", PackTy),
	memop, RC>;			memop, RC>, FMA3_132;
	defm NAME#213#Suff : fma3s_rm_int<opc213, !strconcat(OpStr, "213", PackTy),			defm NAME#213#Suff : fma3s_rm_int<opc213, !strconcat(OpStr, "213", PackTy),
	memop, RC>;			memop, RC>, FMA3_213;
	defm NAME#231#Suff : fma3s_rm_int<opc231, !strconcat(OpStr, "231", PackTy),			defm NAME#231#Suff : fma3s_rm_int<opc231, !strconcat(OpStr, "231", PackTy),
	memop, RC>;			memop, RC>, FMA3_231;
	}			}

	multiclass fma3s<bits<8> opc132, bits<8> opc213, bits<8> opc231,			multiclass fma3s<bits<8> opc132, bits<8> opc213, bits<8> opc231,
	string OpStr, Intrinsic IntF32, Intrinsic IntF64,			string OpStr, Intrinsic IntF32, Intrinsic IntF64,
	SDNode OpNode> {			SDNode OpNode> {
	let ExeDomain = SSEPackedSingle in			let ExeDomain = SSEPackedSingle in
	defm NAME : fma3s_forms<opc132, opc213, opc231, OpStr, "ss", "SS", OpNode,			defm NAME : fma3s_forms<opc132, opc213, opc231, OpStr, "ss", "SS", OpNode,
	FR32, f32mem>,			FR32, f32mem>,
	▲ Show 20 Lines • Show All 214 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrFMA3Info.h

	//===-- X86InstrFMA3Info.h - X86 FMA3 Instruction Information -------------===//			//===-- X86InstrFMA3Info.h - X86 FMA3 Instruction Information -------------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file contains the implementation of the classes providing information			// This file contains tables groups FMA3 instructions together.
	// about existing X86 FMA3 opcodes, classifying and grouping them.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_LIB_TARGET_X86_UTILS_X86INSTRFMA3INFO_H			#ifndef LLVM_LIB_TARGET_X86_X86INSTRFMA3INFO_H
	#define LLVM_LIB_TARGET_X86_UTILS_X86INSTRFMA3INFO_H			#define LLVM_LIB_TARGET_X86_X86INSTRFMA3INFO_H

	#include "X86.h"			#include "MCTargetDesc/X86BaseInfo.h"
	#include "llvm/ADT/DenseMap.h"			#include "llvm/ADT/ArrayRef.h"
				#include "llvm/ADT/STLExtras.h"
				#include <algorithm>
	#include <cassert>			#include <cassert>
	#include <set>

	using namespace llvm;			using namespace llvm;

	/// This class is used to group {132, 213, 231} forms of FMA opcodes together.			#define FMA3SET(Name, Suffix) \
	/// Each of the groups has either 3 register opcodes, 3 memory opcodes,			{ { X86::Name##132##Suffix, X86::Name##213##Suffix, X86::Name##231##Suffix } },
	/// or 6 register and memory opcodes. Also, each group has an attrubutes field
	/// describing it.
	class X86InstrFMA3Group {
	private:
	/// Reference to an array holding 3 forms of register FMA opcodes.
	/// It may be set to nullptr if the group of FMA opcodes does not have
	/// any register form opcodes.
	const uint16_t *RegOpcodes;

	/// Reference to an array holding 3 forms of memory FMA opcodes.
	/// It may be set to nullptr if the group of FMA opcodes does not have
	/// any register form opcodes.
	const uint16_t *MemOpcodes;

	/// This bitfield specifies the attributes associated with the created
	/// FMA groups of opcodes.
	unsigned Attributes;

	static const unsigned Form132 = 0;
	static const unsigned Form213 = 1;
	static const unsigned Form231 = 2;

	public:
	/// This bit must be set in the 'Attributes' field of FMA group if such
	/// group of FMA opcodes consists of FMA intrinsic opcodes.
	static const unsigned X86FMA3Intrinsic = 0x1;

	/// This bit must be set in the 'Attributes' field of FMA group if such
	/// group of FMA opcodes consists of AVX512 opcodes accepting a k-mask and
	/// passing the elements from the 1st operand to the result of the operation
	/// when the correpondings bits in the k-mask are unset.
	static const unsigned X86FMA3KMergeMasked = 0x2;

	/// This bit must be set in the 'Attributes' field of FMA group if such
	/// group of FMA opcodes consists of AVX512 opcodes accepting a k-zeromask.
	static const unsigned X86FMA3KZeroMasked = 0x4;

	/// Constructor. Creates a new group of FMA opcodes with three register form
	/// FMA opcodes \p RegOpcodes and three memory form FMA opcodes \p MemOpcodes.
	/// The parameters \p RegOpcodes and \p MemOpcodes may be set to nullptr,
	/// which means that the created group of FMA opcodes does not have the
	/// corresponding (register or memory) opcodes.
	/// The parameter \p Attr specifies the attributes describing the created
	/// group.
	X86InstrFMA3Group(const uint16_t RegOpcodes, const uint16_t MemOpcodes,
	unsigned Attr)
	: RegOpcodes(RegOpcodes), MemOpcodes(MemOpcodes), Attributes(Attr) {
	assert((RegOpcodes \|\| MemOpcodes) &&
	"Cannot create a group not having any opcodes.");
	}

	/// Returns a memory form opcode that is the equivalent of the given register
	/// form opcode \p RegOpcode. 0 is returned if the group does not have
	/// either register of memory opcodes.
	unsigned getMemOpcode(unsigned RegOpcode) const {
	if (!RegOpcodes \|\| !MemOpcodes)
	return 0;
	for (unsigned Form = 0; Form < 3; Form++)
	if (RegOpcodes[Form] == RegOpcode)
	return MemOpcodes[Form];
	return 0;
	}

	/// Returns the 132 form of FMA register opcode.
	unsigned getReg132Opcode() const {
	assert(RegOpcodes && "The group does not have register opcodes.");
	return RegOpcodes[Form132];
	}

	/// Returns the 213 form of FMA register opcode.
	unsigned getReg213Opcode() const {
	assert(RegOpcodes && "The group does not have register opcodes.");
	return RegOpcodes[Form213];
	}

	/// Returns the 231 form of FMA register opcode.
	unsigned getReg231Opcode() const {
	assert(RegOpcodes && "The group does not have register opcodes.");
	return RegOpcodes[Form231];
	}

	/// Returns the 132 form of FMA memory opcode.
	unsigned getMem132Opcode() const {
	assert(MemOpcodes && "The group does not have memory opcodes.");
	return MemOpcodes[Form132];
	}

	/// Returns the 213 form of FMA memory opcode.			#define FMA3_PACKED_SIZES_AVX512(Name, Type, Suffix) \
	unsigned getMem213Opcode() const {			FMA3SET(Name, Type##Z128##Suffix) \
	assert(MemOpcodes && "The group does not have memory opcodes.");			FMA3SET(Name, Type##Z128##Suffix##k) \
	return MemOpcodes[Form213];			FMA3SET(Name, Type##Z128##Suffix##kz) \
	}			FMA3SET(Name, Type##Z256##Suffix) \
				FMA3SET(Name, Type##Z256##Suffix##k) \
	/// Returns the 231 form of FMA memory opcode.			FMA3SET(Name, Type##Z256##Suffix##kz) \
	unsigned getMem231Opcode() const {			FMA3SET(Name, Type##Z##Suffix) \
	assert(MemOpcodes && "The group does not have memory opcodes.");			FMA3SET(Name, Type##Z##Suffix##k) \
	return MemOpcodes[Form231];			FMA3SET(Name, Type##Z##Suffix##kz)
	}
				#define FMA3_PACKED_GROUP_AVX512(Name, Suffix) \
	/// Returns true iff the group of FMA opcodes holds intrinsic opcodes.			FMA3_PACKED_SIZES_AVX512(Name, PD, Suffix) \
	bool isIntrinsic() const { return (Attributes & X86FMA3Intrinsic) != 0; }			FMA3_PACKED_SIZES_AVX512(Name, PS, Suffix)

	/// Returns true iff the group of FMA opcodes holds k-merge-masked opcodes.			#define FMA3_FULL_GROUP_AVX512(Name, Suffix) \
	bool isKMergeMasked() const {			FMA3_PACKED_GROUP_AVX512(Name, Suffix)
	return (Attributes & X86FMA3KMergeMasked) != 0;
	}			#define FMA3_PACKED_SIZES_MASKED(Name, Type, Suffix) \
				FMA3SET(Name, Type##Z128##Suffix##k) \
				FMA3SET(Name, Type##Z128##Suffix##kz) \
				FMA3SET(Name, Type##Z256##Suffix##k) \
				FMA3SET(Name, Type##Z256##Suffix##kz) \
				FMA3SET(Name, Type##Z##Suffix##k) \
				FMA3SET(Name, Type##Z##Suffix##kz)

				#define FMA3_PACKED_GROUP_MASKED(Name, Suffix) \
				FMA3_PACKED_SIZES_MASKED(Name, PD, Suffix) \
				FMA3_PACKED_SIZES_MASKED(Name, PS, Suffix)

				#define FMA3_FULL_GROUP_MASKED(Name, Suffix) \
				FMA3_PACKED_GROUP_MASKED(Name, Suffix) \
				FMA3SET(Name, SDZ##Suffix##_Intk) \
				FMA3SET(Name, SDZ##Suffix##_Intkz) \
				FMA3SET(Name, SSZ##Suffix##_Intk) \
				FMA3SET(Name, SSZ##Suffix##_Intkz)

				#define FMA3_OPCODES_MASKED(Suffix) \
				FMA3_FULL_GROUP_MASKED(VFMADD, Suffix) \
				FMA3_PACKED_GROUP_MASKED(VFMADDSUB, Suffix) \
				FMA3_FULL_GROUP_MASKED(VFMSUB, Suffix) \
				FMA3_PACKED_GROUP_MASKED(VFMSUBADD, Suffix) \
				FMA3_FULL_GROUP_MASKED(VFNMADD, Suffix) \
				FMA3_FULL_GROUP_MASKED(VFNMSUB, Suffix)

				#define FMA3_PACKED_GROUP_ROUND(Name, Suffix) \
				FMA3SET(Name, PDZ##Suffix) \
				FMA3SET(Name, PDZ##Suffix##k) \
				FMA3SET(Name, PDZ##Suffix##kz) \
				FMA3SET(Name, PSZ##Suffix) \
				FMA3SET(Name, PSZ##Suffix##k) \
				FMA3SET(Name, PSZ##Suffix##kz)

				#define FMA3_FULL_GROUP_ROUND(Name, Suffix) \
				FMA3_PACKED_GROUP_ROUND(Name, Suffix) \
				FMA3SET(Name, SDZ##Suffix##_Int) \
				FMA3SET(Name, SDZ##Suffix##_Intk) \
				FMA3SET(Name, SDZ##Suffix##_Intkz) \
				FMA3SET(Name, SSZ##Suffix##_Int) \
				FMA3SET(Name, SSZ##Suffix##_Intk) \
				FMA3SET(Name, SSZ##Suffix##_Intkz)

				#define FMA3_PACKED_SIZES(Name, Type, Suffix) \
				FMA3SET(Name, Type##Y##Suffix) \
				FMA3SET(Name, Type##Z128##Suffix) \
				FMA3SET(Name, Type##Z256##Suffix) \
				FMA3SET(Name, Type##Z##Suffix) \
				FMA3SET(Name, Type##Suffix)

				#define FMA3_SCALAR_SIZES(Name, Type, Suffix) \
				FMA3SET(Name, Type##Z##Suffix) \
				FMA3SET(Name, Type##Z##Suffix##_Int) \
				FMA3SET(Name, Type##Suffix) \
				FMA3SET(Name, Type##Suffix##_Int)

				#define FMA3_PACKED_GROUP(Name, Suffix) \
				FMA3_PACKED_SIZES(Name, PD, Suffix) \
				FMA3_PACKED_SIZES(Name, PS, Suffix)

				#define FMA3_FULL_GROUP(Name, Suffix) \
				FMA3_PACKED_GROUP(Name, Suffix) \
				FMA3_SCALAR_SIZES(Name, SD, Suffix) \
				FMA3_SCALAR_SIZES(Name, SS, Suffix)

				#define FMA3_OPCODES(Suffix) \
				FMA3_FULL_GROUP(VFMADD, Suffix) \
				FMA3_PACKED_GROUP(VFMADDSUB, Suffix) \
				FMA3_FULL_GROUP(VFMSUB, Suffix) \
				FMA3_PACKED_GROUP(VFMSUBADD, Suffix) \
				FMA3_FULL_GROUP(VFNMADD, Suffix) \
				FMA3_FULL_GROUP(VFNMSUB, Suffix)

				// All of the simple unmasked register opcodes.
				static const X86FMA3Group FMA3RegOpcodes[] = {
				FMA3_OPCODES(r)
				};

	/// Returns true iff the group of FMA opcodes holds k-zero-masked opcodes.			// All of the simple unmasked memory opcodes.
	bool isKZeroMasked() const { return (Attributes & X86FMA3KZeroMasked) != 0; }			static const X86FMA3Group FMA3MemOpcodes[] = {
				FMA3_OPCODES(m)
				};

	/// Returns true iff the group of FMA opcodes holds any of k-masked opcodes.			// All of the simple masked register opcodes.
	bool isKMasked() const {			static const X86FMA3Group FMA3RegMaskedOpcodes[] = {
	return (Attributes & (X86FMA3KMergeMasked \| X86FMA3KZeroMasked)) != 0;			FMA3_OPCODES_MASKED(r)
	}			};

	/// Returns true iff the given \p Opcode is a register opcode from the			// All of the simple masked memory opcodes.
	/// groups of FMA opcodes.			static const X86FMA3Group FMA3MemMaskedOpcodes[] = {
	bool isRegOpcodeFromGroup(unsigned Opcode) const {			FMA3_OPCODES_MASKED(m)
	if (!RegOpcodes)			};
	return false;
	for (unsigned Form = 0; Form < 3; Form++)
	if (Opcode == RegOpcodes[Form])
	return true;
	return false;
	}

	/// Returns true iff the given \p Opcode is a memory opcode from the			// All of the opcodes with builtin rounding control.
	/// groups of FMA opcodes.			static const X86FMA3Group FMA3RoundOpcodes[] = {
	bool isMemOpcodeFromGroup(unsigned Opcode) const {			FMA3_FULL_GROUP_ROUND(VFMADD, rb) \
	if (!MemOpcodes)			FMA3_PACKED_GROUP_ROUND(VFMADDSUB, rb) \
	return false;			FMA3_FULL_GROUP_ROUND(VFMSUB, rb) \
	for (unsigned Form = 0; Form < 3; Form++)			FMA3_PACKED_GROUP_ROUND(VFMSUBADD, rb) \
	if (Opcode == MemOpcodes[Form])			FMA3_FULL_GROUP_ROUND(VFNMADD, rb) \
	return true;			FMA3_FULL_GROUP_ROUND(VFNMSUB, rb)
	return false;
	}
	};			};

	/// This class provides information about all existing FMA3 opcodes			// All of the broadcast opcodes.
	///			static const X86FMA3Group FMA3BroadcastOpcodes[] = {
	class X86InstrFMA3Info {			FMA3_FULL_GROUP_AVX512(VFMADD, mb) \
	private:			FMA3_PACKED_GROUP_AVX512(VFMADDSUB, mb) \
	/// A map that is used to find the group of FMA opcodes using any FMA opcode			FMA3_FULL_GROUP_AVX512(VFMSUB, mb) \
	/// from the group.			FMA3_PACKED_GROUP_AVX512(VFMSUBADD, mb) \
	DenseMap<unsigned, const X86InstrFMA3Group *> OpcodeToGroup;			FMA3_FULL_GROUP_AVX512(VFNMADD, mb) \
				FMA3_FULL_GROUP_AVX512(VFNMSUB, mb)
	/// Creates groups of FMA opcodes and initializes Opcode-to-Group map.			};
	/// This method can be called many times, but the actual initialization is
	/// called only once.
	static void initGroupsOnce();

	/// Creates groups of FMA opcodes and initializes Opcode-to-Group map.
	/// This method must be called ONLY from initGroupsOnce(). Otherwise, such
	/// call is not thread safe.
	void initGroupsOnceImpl();

	/// Creates one group of FMA opcodes having the register opcodes
	/// \p RegOpcodes and memory opcodes \p MemOpcodes. The parameter \p Attr
	/// specifies the attributes describing the created group.
	void initRMGroup(const uint16_t *RegOpcodes,
	const uint16_t *MemOpcodes, unsigned Attr = 0);

	/// Creates one group of FMA opcodes having only the register opcodes
	/// \p RegOpcodes. The parameter \p Attr specifies the attributes describing
	/// the created group.
	void initRGroup(const uint16_t *RegOpcodes, unsigned Attr = 0);

	/// Creates one group of FMA opcodes having only the memory opcodes
	/// \p MemOpcodes. The parameter \p Attr specifies the attributes describing
	/// the created group.
	void initMGroup(const uint16_t *MemOpcodes, unsigned Attr = 0);

	public:
	/// Returns the reference to an object of this class. It is assumed that
	/// only one object may exist.
	static X86InstrFMA3Info *getX86InstrFMA3Info();

	/// Constructor. Just creates an object of the class.
	X86InstrFMA3Info() {}

	/// Destructor. Deallocates the memory used for FMA3 Groups.
	~X86InstrFMA3Info() {
	std::set<const X86InstrFMA3Group *> DeletedGroups;
	auto E = OpcodeToGroup.end();
	for (auto I = OpcodeToGroup.begin(); I != E; I++) {
	const X86InstrFMA3Group *G = I->second;
	if (DeletedGroups.find(G) == DeletedGroups.end()) {
	DeletedGroups.insert(G);
	delete G;
	}
	}
	}

	/// Returns a reference to a group of FMA3 opcodes to where the given			static const X86FMA3Group *getFMA3Group(unsigned Opcode, uint64_t TSFlags) {
	/// \p Opcode is included. If the given \p Opcode is not recognized as FMA3			if (!X86II::isFMA3(TSFlags))
	/// and not included into any FMA3 group, then nullptr is returned.
	static const X86InstrFMA3Group *getFMA3Group(unsigned Opcode) {
	// Ensure that the groups of opcodes are initialized.
	initGroupsOnce();

	// Find the group including the given opcode.
	const X86InstrFMA3Info *FMA3Info = getX86InstrFMA3Info();
	auto I = FMA3Info->OpcodeToGroup.find(Opcode);
	if (I == FMA3Info->OpcodeToGroup.end())
	return nullptr;			return nullptr;

	return I->second;			bool IsMem = X86II::getMemoryOperandNo(TSFlags) != -1;
	}

	/// Returns true iff the given \p Opcode is recognized as FMA3 by this class.
	static bool isFMA3(unsigned Opcode) {
	return getFMA3Group(Opcode) != nullptr;
	}

	/// Iterator that is used to walk on FMA register opcodes having memory			// Determine which array we need to search based on a few attributes.
	/// form equivalents.			ArrayRef<X86FMA3Group> Groups;
	class rm_iterator {			if (TSFlags & X86II::EVEX_B) {
	private:			if (IsMem)
	/// Iterator associated with the OpcodeToGroup map. It must always be			Groups = FMA3BroadcastOpcodes;
	/// initialized with an entry from OpcodeToGroup for which I->first			else
	/// points to a register FMA opcode and I->second points to a group of			Groups = FMA3RoundOpcodes;
	/// FMA opcodes having memory form equivalent of I->first.			} else if (X86II::isKMasked(TSFlags)) {
	DenseMap<unsigned, const X86InstrFMA3Group *>::const_iterator I;			if (IsMem)
				Groups = FMA3MemMaskedOpcodes;
	public:			else
	/// Constructor. Creates rm_iterator. The parameter \p I must be an			Groups = FMA3RegMaskedOpcodes;
	/// iterator to OpcodeToGroup map entry having I->first pointing to			} else {
	/// register form FMA opcode and I->second pointing to a group of FMA			if (IsMem)
	/// opcodes holding memory form equivalent for I->fist.			Groups = FMA3MemOpcodes;
	rm_iterator(DenseMap<unsigned, const X86InstrFMA3Group *>::const_iterator I)			else
	: I(I) {}			Groups = FMA3RegOpcodes;
				}
	/// Returns the register form FMA opcode.
	unsigned getRegOpcode() const { return I->first; };			unsigned FMA3Form = X86II::getFMA3Form(TSFlags);

	/// Returns the memory form equivalent opcode for FMA register opcode			auto I = std::lower_bound(Groups.begin(), Groups.end(),
	/// referenced by I->first.			Opcode,
	unsigned getMemOpcode() const {			[&](const X86FMA3Group &Group, unsigned Opcode) {
	unsigned Opcode = I->first;			return Group.Opcodes[FMA3Form] < Opcode;
	const X86InstrFMA3Group *Group = I->second;			});
	return Group->getMemOpcode(Opcode);			assert(I != Groups.end() && I->Opcodes[FMA3Form] == Opcode &&
				"Couldn't find FMA3 opcode!");
				return I;
				}

				static void verifyFMA3Tables() {
				assert((array_lengthof(FMA3RegOpcodes) == array_lengthof(FMA3MemOpcodes)) &&
				(array_lengthof(FMA3RegMaskedOpcodes) ==
				array_lengthof(FMA3MemMaskedOpcodes)) &&
				"FMA3 reg and mem opcodes tables should be the same size");
				assert(std::is_sorted(std::begin(FMA3RegOpcodes), std::end(FMA3RegOpcodes)) &&
				std::is_sorted(std::begin(FMA3RegMaskedOpcodes),
				std::end(FMA3RegMaskedOpcodes)) &&
				std::is_sorted(std::begin(FMA3RoundOpcodes),
				std::end(FMA3RoundOpcodes)) &&
				std::is_sorted(std::begin(FMA3MemOpcodes), std::end(FMA3MemOpcodes)) &&
				std::is_sorted(std::begin(FMA3MemMaskedOpcodes),
				std::end(FMA3MemMaskedOpcodes)) &&
				std::is_sorted(std::begin(FMA3BroadcastOpcodes),
				std::end(FMA3BroadcastOpcodes)) &&
				"FMA3 arrays should be sorted by opcode!");
	}			}

	/// Returns a reference to a group of FMA opcodes.
	const X86InstrFMA3Group *getGroup() const { return I->second; }

	bool operator==(const rm_iterator &OtherIt) const { return I == OtherIt.I; }
	bool operator!=(const rm_iterator &OtherIt) const { return I != OtherIt.I; }

	/// Increment. Advances the 'I' iterator to the next OpcodeToGroup entry
	/// having I->first pointing to register form FMA and I->second pointing
	/// to a group of FMA opcodes holding memory form equivalen for I->first.
	rm_iterator &operator++() {
	auto E = getX86InstrFMA3Info()->OpcodeToGroup.end();
	for (++I; I != E; ++I) {
	unsigned RegOpcode = I->first;
	const X86InstrFMA3Group *Group = I->second;
	if (Group->getMemOpcode(RegOpcode) != 0)
	break;
	}
	return *this;
	}
	};

	/// Returns rm_iterator pointing to the first entry of OpcodeToGroup map
	/// with a register FMA opcode having memory form opcode equivalent.
	static rm_iterator rm_begin() {
	initGroupsOnce();
	const X86InstrFMA3Info *FMA3Info = getX86InstrFMA3Info();
	auto I = FMA3Info->OpcodeToGroup.begin();
	auto E = FMA3Info->OpcodeToGroup.end();
	while (I != E) {
	unsigned Opcode = I->first;
	const X86InstrFMA3Group *G = I->second;
	if (G->getMemOpcode(Opcode) != 0)
	break;
	I++;
	}
	return rm_iterator(I);
	}

	/// Returns the last rm_iterator.
	static rm_iterator rm_end() {
	initGroupsOnce();
	return rm_iterator(getX86InstrFMA3Info()->OpcodeToGroup.end());
	}
	};

	#endif			#endif

lib/Target/X86/X86InstrFMA3Info.cpp

This file was deleted.

	//===-- X86InstrFMA3Info.cpp - X86 FMA3 Instruction Information -----------===//
	//
	// The LLVM Compiler Infrastructure
	//
	// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.
	//
	//===----------------------------------------------------------------------===//
	//
	// This file contains the implementation of the classes providing information
	// about existing X86 FMA3 opcodes, classifying and grouping them.
	//
	//===----------------------------------------------------------------------===//

	#include "X86InstrFMA3Info.h"
	#include "X86InstrInfo.h"
	#include "llvm/Support/ManagedStatic.h"
	#include "llvm/Support/Threading.h"

	/// This flag is used in the method llvm::call_once() used below to make the
	/// initialization of the map 'OpcodeToGroup' thread safe.
	LLVM_DEFINE_ONCE_FLAG(InitGroupsOnceFlag);

	static ManagedStatic<X86InstrFMA3Info> X86InstrFMA3InfoObj;
	X86InstrFMA3Info *X86InstrFMA3Info::getX86InstrFMA3Info() {
	return &*X86InstrFMA3InfoObj;
	}

	void X86InstrFMA3Info::initRMGroup(const uint16_t *RegOpcodes,
	const uint16_t *MemOpcodes, unsigned Attr) {
	// Create a new instance of this class that would hold a group of FMA opcodes.
	X86InstrFMA3Group *G = new X86InstrFMA3Group(RegOpcodes, MemOpcodes, Attr);

	// Add the references from indvidual opcodes to the group holding them.
	assert((!OpcodeToGroup[RegOpcodes[0]] && !OpcodeToGroup[RegOpcodes[1]] &&
	!OpcodeToGroup[RegOpcodes[2]] && !OpcodeToGroup[MemOpcodes[0]] &&
	!OpcodeToGroup[MemOpcodes[1]] && !OpcodeToGroup[MemOpcodes[2]]) &&
	"Duplication or rewrite of elements in OpcodeToGroup.");
	OpcodeToGroup[RegOpcodes[0]] = G;
	OpcodeToGroup[RegOpcodes[1]] = G;
	OpcodeToGroup[RegOpcodes[2]] = G;
	OpcodeToGroup[MemOpcodes[0]] = G;
	OpcodeToGroup[MemOpcodes[1]] = G;
	OpcodeToGroup[MemOpcodes[2]] = G;
	}

	void X86InstrFMA3Info::initRGroup(const uint16_t *RegOpcodes, unsigned Attr) {
	// Create a new instance of this class that would hold a group of FMA opcodes.
	X86InstrFMA3Group *G = new X86InstrFMA3Group(RegOpcodes, nullptr, Attr);

	// Add the references from indvidual opcodes to the group holding them.
	assert((!OpcodeToGroup[RegOpcodes[0]] && !OpcodeToGroup[RegOpcodes[1]] &&
	!OpcodeToGroup[RegOpcodes[2]]) &&
	"Duplication or rewrite of elements in OpcodeToGroup.");
	OpcodeToGroup[RegOpcodes[0]] = G;
	OpcodeToGroup[RegOpcodes[1]] = G;
	OpcodeToGroup[RegOpcodes[2]] = G;
	}

	void X86InstrFMA3Info::initMGroup(const uint16_t *MemOpcodes, unsigned Attr) {
	// Create a new instance of this class that would hold a group of FMA opcodes.
	X86InstrFMA3Group *G = new X86InstrFMA3Group(nullptr, MemOpcodes, Attr);

	// Add the references from indvidual opcodes to the group holding them.
	assert((!OpcodeToGroup[MemOpcodes[0]] && !OpcodeToGroup[MemOpcodes[1]] &&
	!OpcodeToGroup[MemOpcodes[2]]) &&
	"Duplication or rewrite of elements in OpcodeToGroup.");
	OpcodeToGroup[MemOpcodes[0]] = G;
	OpcodeToGroup[MemOpcodes[1]] = G;
	OpcodeToGroup[MemOpcodes[2]] = G;
	}

	#define FMA3RM(R132, R213, R231, M132, M213, M231) \
	static const uint16_t Reg##R132[3] = {X86::R132, X86::R213, X86::R231}; \
	static const uint16_t Mem##R132[3] = {X86::M132, X86::M213, X86::M231}; \
	initRMGroup(Reg##R132, Mem##R132);

	#define FMA3RMA(R132, R213, R231, M132, M213, M231, Attrs) \
	static const uint16_t Reg##R132[3] = {X86::R132, X86::R213, X86::R231}; \
	static const uint16_t Mem##R132[3] = {X86::M132, X86::M213, X86::M231}; \
	initRMGroup(Reg##R132, Mem##R132, (Attrs));

	#define FMA3R(R132, R213, R231) \
	static const uint16_t Reg##R132[3] = {X86::R132, X86::R213, X86::R231}; \
	initRGroup(Reg##R132);

	#define FMA3RA(R132, R213, R231, Attrs) \
	static const uint16_t Reg##R132[3] = {X86::R132, X86::R213, X86::R231}; \
	initRGroup(Reg##R132, (Attrs));

	#define FMA3M(M132, M213, M231) \
	static const uint16_t Mem##M132[3] = {X86::M132, X86::M213, X86::M231}; \
	initMGroup(Mem##M132);

	#define FMA3MA(M132, M213, M231, Attrs) \
	static const uint16_t Mem##M132[3] = {X86::M132, X86::M213, X86::M231}; \
	initMGroup(Mem##M132, (Attrs));

	#define FMA3_AVX2_VECTOR_GROUP(Name) \
	FMA3RM(Name##132PSr, Name##213PSr, Name##231PSr, \
	Name##132PSm, Name##213PSm, Name##231PSm); \
	FMA3RM(Name##132PDr, Name##213PDr, Name##231PDr, \
	Name##132PDm, Name##213PDm, Name##231PDm); \
	FMA3RM(Name##132PSYr, Name##213PSYr, Name##231PSYr, \
	Name##132PSYm, Name##213PSYm, Name##231PSYm); \
	FMA3RM(Name##132PDYr, Name##213PDYr, Name##231PDYr, \
	Name##132PDYm, Name##213PDYm, Name##231PDYm);

	#define FMA3_AVX2_SCALAR_GROUP(Name) \
	FMA3RM(Name##132SSr, Name##213SSr, Name##231SSr, \
	Name##132SSm, Name##213SSm, Name##231SSm); \
	FMA3RM(Name##132SDr, Name##213SDr, Name##231SDr, \
	Name##132SDm, Name##213SDm, Name##231SDm); \
	FMA3RMA(Name##132SSr_Int, Name##213SSr_Int, Name##231SSr_Int, \
	Name##132SSm_Int, Name##213SSm_Int, Name##231SSm_Int, \
	X86InstrFMA3Group::X86FMA3Intrinsic); \
	FMA3RMA(Name##132SDr_Int, Name##213SDr_Int, Name##231SDr_Int, \
	Name##132SDm_Int, Name##213SDm_Int, Name##231SDm_Int, \
	X86InstrFMA3Group::X86FMA3Intrinsic);

	#define FMA3_AVX2_FULL_GROUP(Name) \
	FMA3_AVX2_VECTOR_GROUP(Name); \
	FMA3_AVX2_SCALAR_GROUP(Name);

	#define FMA3_AVX512_VECTOR_GROUP(Name) \
	FMA3RM(Name##132PSZ128r, Name##213PSZ128r, Name##231PSZ128r, \
	Name##132PSZ128m, Name##213PSZ128m, Name##231PSZ128m); \
	FMA3RM(Name##132PDZ128r, Name##213PDZ128r, Name##231PDZ128r, \
	Name##132PDZ128m, Name##213PDZ128m, Name##231PDZ128m); \
	FMA3RM(Name##132PSZ256r, Name##213PSZ256r, Name##231PSZ256r, \
	Name##132PSZ256m, Name##213PSZ256m, Name##231PSZ256m); \
	FMA3RM(Name##132PDZ256r, Name##213PDZ256r, Name##231PDZ256r, \
	Name##132PDZ256m, Name##213PDZ256m, Name##231PDZ256m); \
	FMA3RM(Name##132PSZr, Name##213PSZr, Name##231PSZr, \
	Name##132PSZm, Name##213PSZm, Name##231PSZm); \
	FMA3RM(Name##132PDZr, Name##213PDZr, Name##231PDZr, \
	Name##132PDZm, Name##213PDZm, Name##231PDZm); \
	FMA3RMA(Name##132PSZ128rk, Name##213PSZ128rk, Name##231PSZ128rk, \
	Name##132PSZ128mk, Name##213PSZ128mk, Name##231PSZ128mk, \
	X86InstrFMA3Group::X86FMA3KMergeMasked); \
	FMA3RMA(Name##132PDZ128rk, Name##213PDZ128rk, Name##231PDZ128rk, \
	Name##132PDZ128mk, Name##213PDZ128mk, Name##231PDZ128mk, \
	X86InstrFMA3Group::X86FMA3KMergeMasked); \
	FMA3RMA(Name##132PSZ256rk, Name##213PSZ256rk, Name##231PSZ256rk, \
	Name##132PSZ256mk, Name##213PSZ256mk, Name##231PSZ256mk, \
	X86InstrFMA3Group::X86FMA3KMergeMasked); \
	FMA3RMA(Name##132PDZ256rk, Name##213PDZ256rk, Name##231PDZ256rk, \
	Name##132PDZ256mk, Name##213PDZ256mk, Name##231PDZ256mk, \
	X86InstrFMA3Group::X86FMA3KMergeMasked); \
	FMA3RMA(Name##132PSZrk, Name##213PSZrk, Name##231PSZrk, \
	Name##132PSZmk, Name##213PSZmk, Name##231PSZmk, \
	X86InstrFMA3Group::X86FMA3KMergeMasked); \
	FMA3RMA(Name##132PDZrk, Name##213PDZrk, Name##231PDZrk, \
	Name##132PDZmk, Name##213PDZmk, Name##231PDZmk, \
	X86InstrFMA3Group::X86FMA3KMergeMasked); \
	FMA3RMA(Name##132PSZ128rkz, Name##213PSZ128rkz, Name##231PSZ128rkz, \
	Name##132PSZ128mkz, Name##213PSZ128mkz, Name##231PSZ128mkz, \
	X86InstrFMA3Group::X86FMA3KZeroMasked); \
	FMA3RMA(Name##132PDZ128rkz, Name##213PDZ128rkz, Name##231PDZ128rkz, \
	Name##132PDZ128mkz, Name##213PDZ128mkz, Name##231PDZ128mkz, \
	X86InstrFMA3Group::X86FMA3KZeroMasked); \
	FMA3RMA(Name##132PSZ256rkz, Name##213PSZ256rkz, Name##231PSZ256rkz, \
	Name##132PSZ256mkz, Name##213PSZ256mkz, Name##231PSZ256mkz, \
	X86InstrFMA3Group::X86FMA3KZeroMasked); \
	FMA3RMA(Name##132PDZ256rkz, Name##213PDZ256rkz, Name##231PDZ256rkz, \
	Name##132PDZ256mkz, Name##213PDZ256mkz, Name##231PDZ256mkz, \
	X86InstrFMA3Group::X86FMA3KZeroMasked); \
	FMA3RMA(Name##132PSZrkz, Name##213PSZrkz, Name##231PSZrkz, \
	Name##132PSZmkz, Name##213PSZmkz, Name##231PSZmkz, \
	X86InstrFMA3Group::X86FMA3KZeroMasked); \
	FMA3RMA(Name##132PDZrkz, Name##213PDZrkz, Name##231PDZrkz, \
	Name##132PDZmkz, Name##213PDZmkz, Name##231PDZmkz, \
	X86InstrFMA3Group::X86FMA3KZeroMasked); \
	FMA3R(Name##132PSZrb, Name##213PSZrb, Name##231PSZrb); \
	FMA3R(Name##132PDZrb, Name##213PDZrb, Name##231PDZrb); \
	FMA3RA(Name##132PSZrbk, Name##213PSZrbk, Name##231PSZrbk, \
	X86InstrFMA3Group::X86FMA3KMergeMasked); \
	FMA3RA(Name##132PDZrbk, Name##213PDZrbk, Name##231PDZrbk, \
	X86InstrFMA3Group::X86FMA3KMergeMasked); \
	FMA3RA(Name##132PSZrbkz, Name##213PSZrbkz, Name##231PSZrbkz, \
	X86InstrFMA3Group::X86FMA3KZeroMasked); \
	FMA3RA(Name##132PDZrbkz, Name##213PDZrbkz, Name##231PDZrbkz, \
	X86InstrFMA3Group::X86FMA3KZeroMasked); \
	FMA3M(Name##132PSZ128mb, Name##213PSZ128mb, Name##231PSZ128mb); \
	FMA3M(Name##132PDZ128mb, Name##213PDZ128mb, Name##231PDZ128mb); \
	FMA3M(Name##132PSZ256mb, Name##213PSZ256mb, Name##231PSZ256mb); \
	FMA3M(Name##132PDZ256mb, Name##213PDZ256mb, Name##231PDZ256mb); \
	FMA3M(Name##132PSZmb, Name##213PSZmb, Name##231PSZmb); \
	FMA3M(Name##132PDZmb, Name##213PDZmb, Name##231PDZmb); \
	FMA3MA(Name##132PSZ128mbk, Name##213PSZ128mbk, Name##231PSZ128mbk, \
	X86InstrFMA3Group::X86FMA3KMergeMasked); \
	FMA3MA(Name##132PDZ128mbk, Name##213PDZ128mbk, Name##231PDZ128mbk, \
	X86InstrFMA3Group::X86FMA3KMergeMasked); \
	FMA3MA(Name##132PSZ256mbk, Name##213PSZ256mbk, Name##231PSZ256mbk, \
	X86InstrFMA3Group::X86FMA3KMergeMasked); \
	FMA3MA(Name##132PDZ256mbk, Name##213PDZ256mbk, Name##231PDZ256mbk, \
	X86InstrFMA3Group::X86FMA3KMergeMasked); \
	FMA3MA(Name##132PSZmbk, Name##213PSZmbk, Name##231PSZmbk, \
	X86InstrFMA3Group::X86FMA3KMergeMasked); \
	FMA3MA(Name##132PDZmbk, Name##213PDZmbk, Name##231PDZmbk, \
	X86InstrFMA3Group::X86FMA3KMergeMasked); \
	FMA3MA(Name##132PSZ128mbkz, Name##213PSZ128mbkz, Name##231PSZ128mbkz, \
	X86InstrFMA3Group::X86FMA3KZeroMasked); \
	FMA3MA(Name##132PDZ128mbkz, Name##213PDZ128mbkz, Name##231PDZ128mbkz, \
	X86InstrFMA3Group::X86FMA3KZeroMasked); \
	FMA3MA(Name##132PSZ256mbkz, Name##213PSZ256mbkz, Name##231PSZ256mbkz, \
	X86InstrFMA3Group::X86FMA3KZeroMasked); \
	FMA3MA(Name##132PDZ256mbkz, Name##213PDZ256mbkz, Name##231PDZ256mbkz, \
	X86InstrFMA3Group::X86FMA3KZeroMasked); \
	FMA3MA(Name##132PSZmbkz, Name##213PSZmbkz, Name##231PSZmbkz, \
	X86InstrFMA3Group::X86FMA3KZeroMasked); \
	FMA3MA(Name##132PDZmbkz, Name##213PDZmbkz, Name##231PDZmbkz, \
	X86InstrFMA3Group::X86FMA3KZeroMasked);

	#define FMA3_AVX512_SCALAR_GROUP(Name) \
	FMA3RM(Name##132SSZr, Name##213SSZr, Name##231SSZr, \
	Name##132SSZm, Name##213SSZm, Name##231SSZm); \
	FMA3RM(Name##132SDZr, Name##213SDZr, Name##231SDZr, \
	Name##132SDZm, Name##213SDZm, Name##231SDZm); \
	FMA3RMA(Name##132SSZr_Int, Name##213SSZr_Int, Name##231SSZr_Int, \
	Name##132SSZm_Int, Name##213SSZm_Int, Name##231SSZm_Int, \
	X86InstrFMA3Group::X86FMA3Intrinsic); \
	FMA3RMA(Name##132SDZr_Int, Name##213SDZr_Int, Name##231SDZr_Int, \
	Name##132SDZm_Int, Name##213SDZm_Int, Name##231SDZm_Int, \
	X86InstrFMA3Group::X86FMA3Intrinsic); \
	FMA3RMA(Name##132SSZr_Intk, Name##213SSZr_Intk, Name##231SSZr_Intk, \
	Name##132SSZm_Intk, Name##213SSZm_Intk, Name##231SSZm_Intk, \
	X86InstrFMA3Group::X86FMA3Intrinsic \| \
	X86InstrFMA3Group::X86FMA3KMergeMasked); \
	FMA3RMA(Name##132SDZr_Intk, Name##213SDZr_Intk, Name##231SDZr_Intk, \
	Name##132SDZm_Intk, Name##213SDZm_Intk, Name##231SDZm_Intk, \
	X86InstrFMA3Group::X86FMA3Intrinsic \| \
	X86InstrFMA3Group::X86FMA3KMergeMasked); \
	FMA3RMA(Name##132SSZr_Intkz, Name##213SSZr_Intkz, Name##231SSZr_Intkz, \
	Name##132SSZm_Intkz, Name##213SSZm_Intkz, Name##231SSZm_Intkz, \
	X86InstrFMA3Group::X86FMA3Intrinsic \| \
	X86InstrFMA3Group::X86FMA3KZeroMasked); \
	FMA3RMA(Name##132SDZr_Intkz, Name##213SDZr_Intkz, Name##231SDZr_Intkz, \
	Name##132SDZm_Intkz, Name##213SDZm_Intkz, Name##231SDZm_Intkz, \
	X86InstrFMA3Group::X86FMA3Intrinsic \| \
	X86InstrFMA3Group::X86FMA3KZeroMasked); \
	FMA3RA(Name##132SSZrb_Int, Name##213SSZrb_Int, Name##231SSZrb_Int, \
	X86InstrFMA3Group::X86FMA3Intrinsic); \
	FMA3RA(Name##132SDZrb_Int, Name##213SDZrb_Int, Name##231SDZrb_Int, \
	X86InstrFMA3Group::X86FMA3Intrinsic); \
	FMA3RA(Name##132SSZrb_Intk, Name##213SSZrb_Intk, Name##231SSZrb_Intk, \
	X86InstrFMA3Group::X86FMA3Intrinsic \| \
	X86InstrFMA3Group::X86FMA3KMergeMasked); \
	FMA3RA(Name##132SDZrb_Intk, Name##213SDZrb_Intk, Name##231SDZrb_Intk, \
	X86InstrFMA3Group::X86FMA3Intrinsic \| \
	X86InstrFMA3Group::X86FMA3KMergeMasked); \
	FMA3RA(Name##132SSZrb_Intkz, Name##213SSZrb_Intkz, Name##231SSZrb_Intkz, \
	X86InstrFMA3Group::X86FMA3Intrinsic \| \
	X86InstrFMA3Group::X86FMA3KZeroMasked); \
	FMA3RA(Name##132SDZrb_Intkz, Name##213SDZrb_Intkz, Name##231SDZrb_Intkz, \
	X86InstrFMA3Group::X86FMA3Intrinsic \| \
	X86InstrFMA3Group::X86FMA3KZeroMasked);

	#define FMA3_AVX512_FULL_GROUP(Name) \
	FMA3_AVX512_VECTOR_GROUP(Name); \
	FMA3_AVX512_SCALAR_GROUP(Name);

	void X86InstrFMA3Info::initGroupsOnceImpl() {
	FMA3_AVX2_FULL_GROUP(VFMADD);
	FMA3_AVX2_FULL_GROUP(VFMSUB);
	FMA3_AVX2_FULL_GROUP(VFNMADD);
	FMA3_AVX2_FULL_GROUP(VFNMSUB);

	FMA3_AVX2_VECTOR_GROUP(VFMADDSUB);
	FMA3_AVX2_VECTOR_GROUP(VFMSUBADD);

	FMA3_AVX512_FULL_GROUP(VFMADD);
	FMA3_AVX512_FULL_GROUP(VFMSUB);
	FMA3_AVX512_FULL_GROUP(VFNMADD);
	FMA3_AVX512_FULL_GROUP(VFNMSUB);

	FMA3_AVX512_VECTOR_GROUP(VFMADDSUB);
	FMA3_AVX512_VECTOR_GROUP(VFMSUBADD);
	}

	void X86InstrFMA3Info::initGroupsOnce() {
	llvm::call_once(InitGroupsOnceFlag,
	[]() { getX86InstrFMA3Info()->initGroupsOnceImpl(); });
	}

lib/Target/X86/X86InstrFormats.td

Show First 20 Lines • Show All 164 Lines • ▼ Show 20 Lines
class AddressSize<bits<2> val> {		class AddressSize<bits<2> val> {
bits<2> Value = val;		bits<2> Value = val;
}		}
def AdSizeX : AddressSize<0>; // Address size determined using addr operand.		def AdSizeX : AddressSize<0>; // Address size determined using addr operand.
def AdSize16 : AddressSize<1>; // Encodes a 16-bit address.		def AdSize16 : AddressSize<1>; // Encodes a 16-bit address.
def AdSize32 : AddressSize<2>; // Encodes a 32-bit address.		def AdSize32 : AddressSize<2>; // Encodes a 32-bit address.
def AdSize64 : AddressSize<3>; // Encodes a 64-bit address.		def AdSize64 : AddressSize<3>; // Encodes a 64-bit address.

		// FMA3Format - This specifies what form this FMA3 instruction is. This is used
		// for FMA3 commuting.
		class FMA3Format<bits<2> val> {
		bits<2> Value = val;
		}
		def NotFMA3 : FMA3Format<0>;
		def FMA3_132 : FMA3Format<1>;
		def FMA3_213 : FMA3Format<2>;
		def FMA3_231 : FMA3Format<3>;

// Prefix byte classes which are used to indicate to the ad-hoc machine code		// Prefix byte classes which are used to indicate to the ad-hoc machine code
// emitter that various prefix bytes are required.		// emitter that various prefix bytes are required.
class OpSize16 { OperandSize OpSize = OpSize16; }		class OpSize16 { OperandSize OpSize = OpSize16; }
class OpSize32 { OperandSize OpSize = OpSize32; }		class OpSize32 { OperandSize OpSize = OpSize32; }
class AdSize16 { AddressSize AdSize = AdSize16; }		class AdSize16 { AddressSize AdSize = AdSize16; }
class AdSize32 { AddressSize AdSize = AdSize32; }		class AdSize32 { AddressSize AdSize = AdSize32; }
class AdSize64 { AddressSize AdSize = AdSize64; }		class AdSize64 { AddressSize AdSize = AdSize64; }
class REX_W { bit hasREX_WPrefix = 1; }		class REX_W { bit hasREX_WPrefix = 1; }
Show All 26 Lines
class EVEX_4V : VEX_4V { Encoding OpEnc = EncEVEX; }		class EVEX_4V : VEX_4V { Encoding OpEnc = EncEVEX; }
class EVEX_K { bit hasEVEX_K = 1; }		class EVEX_K { bit hasEVEX_K = 1; }
class EVEX_KZ : EVEX_K { bit hasEVEX_Z = 1; }		class EVEX_KZ : EVEX_K { bit hasEVEX_Z = 1; }
class EVEX_B { bit hasEVEX_B = 1; }		class EVEX_B { bit hasEVEX_B = 1; }
class EVEX_RC { bit hasEVEX_RC = 1; }		class EVEX_RC { bit hasEVEX_RC = 1; }
class EVEX_V512 { bit hasEVEX_L2 = 1; bit hasVEX_L = 0; }		class EVEX_V512 { bit hasEVEX_L2 = 1; bit hasVEX_L = 0; }
class EVEX_V256 { bit hasEVEX_L2 = 0; bit hasVEX_L = 1; }		class EVEX_V256 { bit hasEVEX_L2 = 0; bit hasVEX_L = 1; }
class EVEX_V128 { bit hasEVEX_L2 = 0; bit hasVEX_L = 0; }		class EVEX_V128 { bit hasEVEX_L2 = 0; bit hasVEX_L = 0; }
		class FMA3_132 { FMA3Format FMA3Form = FMA3_132; }
		class FMA3_213 { FMA3Format FMA3Form = FMA3_213; }
		class FMA3_231 { FMA3Format FMA3Form = FMA3_231; }

// Specify AVX512 8-bit compressed displacement encoding based on the vector		// Specify AVX512 8-bit compressed displacement encoding based on the vector
// element size in bits (8, 16, 32, 64) and the CDisp8 form.		// element size in bits (8, 16, 32, 64) and the CDisp8 form.
class EVEX_CD8<int esize, CD8VForm form> {		class EVEX_CD8<int esize, CD8VForm form> {
int CD8_EltSize = !srl(esize, 3);		int CD8_EltSize = !srl(esize, 3);
bits<3> CD8_Form = form.Value;		bits<3> CD8_Form = form.Value;
}		}

▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	class X86Inst<bits<8> opcod, Format f, ImmType i, dag outs, dag ins,
bit hasEVEX_L2 = 0; // Does this inst set the EVEX_L2 field?		bit hasEVEX_L2 = 0; // Does this inst set the EVEX_L2 field?
bit hasEVEX_B = 0; // Does this inst set the EVEX_B field?		bit hasEVEX_B = 0; // Does this inst set the EVEX_B field?
bits<3> CD8_Form = 0; // Compressed disp8 form - vector-width.		bits<3> CD8_Form = 0; // Compressed disp8 form - vector-width.
// Declare it int rather than bits<4> so that all bits are defined when		// Declare it int rather than bits<4> so that all bits are defined when
// assigning to bits<7>.		// assigning to bits<7>.
int CD8_EltSize = 0; // Compressed disp8 form - element-size in bytes.		int CD8_EltSize = 0; // Compressed disp8 form - element-size in bytes.
bit has3DNow0F0FOpcode =0;// Wacky 3dNow! encoding?		bit has3DNow0F0FOpcode =0;// Wacky 3dNow! encoding?
bit hasEVEX_RC = 0; // Explicitly specified rounding control in FP instruction.		bit hasEVEX_RC = 0; // Explicitly specified rounding control in FP instruction.
		FMA3Format FMA3Form = NotFMA3; // What flavor of FMA3 is this?
		bit FMA3Intrinsic = 0; // Is this an FMA3 scalar intrinsic opcode.

bits<2> EVEX_LL;		bits<2> EVEX_LL;
let EVEX_LL{0} = hasVEX_L;		let EVEX_LL{0} = hasVEX_L;
let EVEX_LL{1} = hasEVEX_L2;		let EVEX_LL{1} = hasEVEX_L2;
// Vector size in bytes.		// Vector size in bytes.
bits<7> VectSize = !shl(16, EVEX_LL);		bits<7> VectSize = !shl(16, EVEX_LL);

// The scaling factor for AVX512's compressed displacement is either		// The scaling factor for AVX512's compressed displacement is either
Show All 28 Lines	class X86Inst<bits<8> opcod, Format f, ImmType i, dag outs, dag ins,
let TSFlags{42} = hasEVEX_K;		let TSFlags{42} = hasEVEX_K;
let TSFlags{43} = hasEVEX_Z;		let TSFlags{43} = hasEVEX_Z;
let TSFlags{44} = hasEVEX_L2;		let TSFlags{44} = hasEVEX_L2;
let TSFlags{45} = hasEVEX_B;		let TSFlags{45} = hasEVEX_B;
// If we run out of TSFlags bits, it's possible to encode this in 3 bits.		// If we run out of TSFlags bits, it's possible to encode this in 3 bits.
let TSFlags{52-46} = CD8_Scale;		let TSFlags{52-46} = CD8_Scale;
let TSFlags{53} = has3DNow0F0FOpcode;		let TSFlags{53} = has3DNow0F0FOpcode;
let TSFlags{54} = hasEVEX_RC;		let TSFlags{54} = hasEVEX_RC;
		let TSFlags{56-55} = FMA3Form.Value;
		let TSFlags{57} = FMA3Intrinsic;
}		}

class PseudoI<dag oops, dag iops, list<dag> pattern>		class PseudoI<dag oops, dag iops, list<dag> pattern>
: X86Inst<0, Pseudo, NoImm, oops, iops, "", NoItinerary> {		: X86Inst<0, Pseudo, NoImm, oops, iops, "", NoItinerary> {
let Pattern = pattern;		let Pattern = pattern;
}		}

class I<bits<8> o, Format f, dag outs, dag ins, string asm,		class I<bits<8> o, Format f, dag outs, dag ins, string asm,
▲ Show 20 Lines • Show All 619 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrInfo.h

Show All 9 Lines
// This file contains the X86 implementation of the TargetInstrInfo class.		// This file contains the X86 implementation of the TargetInstrInfo class.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_LIB_TARGET_X86_X86INSTRINFO_H		#ifndef LLVM_LIB_TARGET_X86_X86INSTRINFO_H
#define LLVM_LIB_TARGET_X86_X86INSTRINFO_H		#define LLVM_LIB_TARGET_X86_X86INSTRINFO_H

#include "MCTargetDesc/X86BaseInfo.h"		#include "MCTargetDesc/X86BaseInfo.h"
#include "X86InstrFMA3Info.h"
#include "X86RegisterInfo.h"		#include "X86RegisterInfo.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/Target/TargetInstrInfo.h"		#include "llvm/Target/TargetInstrInfo.h"

#define GET_INSTRINFO_HEADER		#define GET_INSTRINFO_HEADER
#include "X86GenInstrInfo.inc"		#include "X86GenInstrInfo.inc"

namespace llvm {		namespace llvm {
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines

inline static bool isMem(const MachineInstr &MI, unsigned Op) {		inline static bool isMem(const MachineInstr &MI, unsigned Op) {
if (MI.getOperand(Op).isFI())		if (MI.getOperand(Op).isFI())
return true;		return true;
return Op + X86::AddrNumOperands <= MI.getNumOperands() &&		return Op + X86::AddrNumOperands <= MI.getNumOperands() &&
MI.getOperand(Op + X86::AddrSegmentReg).isReg() && isLeaMem(MI, Op);		MI.getOperand(Op + X86::AddrSegmentReg).isReg() && isLeaMem(MI, Op);
}		}

		struct X86FMA3Group {
		uint16_t Opcodes[3];

		bool operator<(const X86FMA3Group &RHS) const {
		return Opcodes[0] < RHS.Opcodes[0];
		}
		};

class X86InstrInfo final : public X86GenInstrInfo {		class X86InstrInfo final : public X86GenInstrInfo {
X86Subtarget &Subtarget;		X86Subtarget &Subtarget;
const X86RegisterInfo RI;		const X86RegisterInfo RI;

/// RegOp2MemOpTable3Addr, RegOp2MemOpTable0, RegOp2MemOpTable1,		/// RegOp2MemOpTable3Addr, RegOp2MemOpTable0, RegOp2MemOpTable1,
/// RegOp2MemOpTable2, RegOp2MemOpTable3 - Load / store folding opcode maps.		/// RegOp2MemOpTable2, RegOp2MemOpTable3 - Load / store folding opcode maps.
///		///
typedef DenseMap<unsigned,		typedef DenseMap<unsigned,
▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	public:
/// \p SrcOpIdx1 and \p SrcOpIdx2 are INPUT and OUTPUT arguments.		/// \p SrcOpIdx1 and \p SrcOpIdx2 are INPUT and OUTPUT arguments.
/// The output indices of the commuted operands are returned in these		/// The output indices of the commuted operands are returned in these
/// arguments. Also, the input values of these arguments may be preset either		/// arguments. Also, the input values of these arguments may be preset either
/// to indices of operands that must be commuted or be equal to a special		/// to indices of operands that must be commuted or be equal to a special
/// value 'CommuteAnyOperandIndex' which means that the corresponding		/// value 'CommuteAnyOperandIndex' which means that the corresponding
/// operand index is not set and this method is free to pick any of		/// operand index is not set and this method is free to pick any of
/// available commutable operands.		/// available commutable operands.
/// The parameter \p FMA3Group keeps the reference to the group of relative		/// The parameter \p FMA3Group keeps the reference to the group of relative
/// FMA3 opcodes including register/memory forms of 132/213/231 opcodes.		/// FMA3 opcodes.
///		///
/// For example, calling this method this way:		/// For example, calling this method this way:
/// unsigned Idx1 = 1, Idx2 = CommuteAnyOperandIndex;		/// unsigned Idx1 = 1, Idx2 = CommuteAnyOperandIndex;
/// findFMA3CommutedOpIndices(MI, Idx1, Idx2, FMA3Group);		/// findFMA3CommutedOpIndices(MI, Idx1, Idx2, FMA3Group);
/// can be interpreted as a query asking if the operand #1 can be swapped		/// can be interpreted as a query asking if the operand #1 can be swapped
/// with any other available operand (e.g. operand #2, operand #3, etc.).		/// with any other available operand (e.g. operand #2, operand #3, etc.).
///		///
/// The returned FMA opcode may differ from the opcode in the given MI.		/// The returned FMA opcode may differ from the opcode in the given MI.
/// For example, commuting the operands #1 and #3 in the following FMA		/// For example, commuting the operands #1 and #3 in the following FMA
/// FMA213 #1, #2, #3		/// FMA213 #1, #2, #3
/// results into instruction with adjusted opcode:		/// results into instruction with adjusted opcode:
/// FMA231 #3, #2, #1		/// FMA231 #3, #2, #1
bool findFMA3CommutedOpIndices(const MachineInstr &MI,		bool findFMA3CommutedOpIndices(const MachineInstr &MI,
unsigned &SrcOpIdx1,		unsigned &SrcOpIdx1,
unsigned &SrcOpIdx2,		unsigned &SrcOpIdx2,
const X86InstrFMA3Group &FMA3Group) const;		const X86FMA3Group &FMA3Group) const;

/// Returns an adjusted FMA opcode that must be used in FMA instruction that		/// Returns an adjusted FMA opcode that must be used in FMA instruction that
/// performs the same computations as the given \p MI but which has the		/// performs the same computations as the given \p MI but which has the
/// operands \p SrcOpIdx1 and \p SrcOpIdx2 commuted.		/// operands \p SrcOpIdx1 and \p SrcOpIdx2 commuted.
/// It may return 0 if it is unsafe to commute the operands.		/// It may return 0 if it is unsafe to commute the operands.
/// Note that a machine instruction (instead of its opcode) is passed as the		/// Note that a machine instruction (instead of its opcode) is passed as the
/// first parameter to make it possible to analyze the instruction's uses and		/// first parameter to make it possible to analyze the instruction's uses and
/// commute the first operand of FMA even when it seems unsafe when you look		/// commute the first operand of FMA even when it seems unsafe when you look
/// at the opcode. For example, it is Ok to commute the first operand of		/// at the opcode. For example, it is Ok to commute the first operand of
/// VFMADD*SD_Int, if ONLY the lowest 64-bit element of the result is used.		/// VFMADD*SD_Int, if ONLY the lowest 64-bit element of the result is used.
///		///
/// The returned FMA opcode may differ from the opcode in the given \p MI.		/// The returned FMA opcode may differ from the opcode in the given \p MI.
/// For example, commuting the operands #1 and #3 in the following FMA		/// For example, commuting the operands #1 and #3 in the following FMA
/// FMA213 #1, #2, #3		/// FMA213 #1, #2, #3
/// results into instruction with adjusted opcode:		/// results into instruction with adjusted opcode:
/// FMA231 #3, #2, #1		/// FMA231 #3, #2, #1
unsigned getFMA3OpcodeToCommuteOperands(const MachineInstr &MI,		unsigned getFMA3OpcodeToCommuteOperands(const MachineInstr &MI,
unsigned SrcOpIdx1,		unsigned SrcOpIdx1,
unsigned SrcOpIdx2,		unsigned SrcOpIdx2,
const X86InstrFMA3Group &FMA3Group) const;		const X86FMA3Group &FMA3Group) const;

// Branch analysis.		// Branch analysis.
bool isUnpredicatedTerminator(const MachineInstr &MI) const override;		bool isUnpredicatedTerminator(const MachineInstr &MI) const override;
bool analyzeBranch(MachineBasicBlock &MBB, MachineBasicBlock *&TBB,		bool analyzeBranch(MachineBasicBlock &MBB, MachineBasicBlock *&TBB,
MachineBasicBlock *&FBB,		MachineBasicBlock *&FBB,
SmallVectorImpl<MachineOperand> &Cond,		SmallVectorImpl<MachineOperand> &Cond,
bool AllowModify) const override;		bool AllowModify) const override;

▲ Show 20 Lines • Show All 262 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

//===-- X86InstrInfo.cpp - X86 Instruction Information --------------------===//		//===-- X86InstrInfo.cpp - X86 Instruction Information --------------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file contains the X86 implementation of the TargetInstrInfo class.		// This file contains the X86 implementation of the TargetInstrInfo class.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "X86InstrInfo.h"		#include "X86InstrInfo.h"
#include "X86.h"		#include "X86.h"
#include "X86InstrBuilder.h"		#include "X86InstrBuilder.h"
		#include "X86InstrFMA3Info.h"
#include "X86MachineFunctionInfo.h"		#include "X86MachineFunctionInfo.h"
#include "X86Subtarget.h"		#include "X86Subtarget.h"
#include "X86TargetMachine.h"		#include "X86TargetMachine.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/CodeGen/LivePhysRegs.h"		#include "llvm/CodeGen/LivePhysRegs.h"
#include "llvm/CodeGen/LiveVariables.h"		#include "llvm/CodeGen/LiveVariables.h"
#include "llvm/CodeGen/MachineConstantPool.h"		#include "llvm/CodeGen/MachineConstantPool.h"
#include "llvm/CodeGen/MachineDominators.h"		#include "llvm/CodeGen/MachineDominators.h"
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	X86InstrInfo::X86InstrInfo(X86Subtarget &STI)
: X86GenInstrInfo((STI.isTarget64BitLP64() ? X86::ADJCALLSTACKDOWN64		: X86GenInstrInfo((STI.isTarget64BitLP64() ? X86::ADJCALLSTACKDOWN64
: X86::ADJCALLSTACKDOWN32),		: X86::ADJCALLSTACKDOWN32),
(STI.isTarget64BitLP64() ? X86::ADJCALLSTACKUP64		(STI.isTarget64BitLP64() ? X86::ADJCALLSTACKUP64
: X86::ADJCALLSTACKUP32),		: X86::ADJCALLSTACKUP32),
X86::CATCHRET,		X86::CATCHRET,
(STI.is64Bit() ? X86::RETQ : X86::RETL)),		(STI.is64Bit() ? X86::RETQ : X86::RETL)),
Subtarget(STI), RI(STI.getTargetTriple()) {		Subtarget(STI), RI(STI.getTargetTriple()) {

		verifyFMA3Tables();

static const X86MemoryFoldTableEntry MemoryFoldTable2Addr[] = {		static const X86MemoryFoldTableEntry MemoryFoldTable2Addr[] = {
{ X86::ADC32ri, X86::ADC32mi, 0 },		{ X86::ADC32ri, X86::ADC32mi, 0 },
{ X86::ADC32ri8, X86::ADC32mi8, 0 },		{ X86::ADC32ri8, X86::ADC32mi8, 0 },
{ X86::ADC32rr, X86::ADC32mr, 0 },		{ X86::ADC32rr, X86::ADC32mr, 0 },
{ X86::ADC64ri32, X86::ADC64mi32, 0 },		{ X86::ADC64ri32, X86::ADC64mi32, 0 },
{ X86::ADC64ri8, X86::ADC64mi8, 0 },		{ X86::ADC64ri8, X86::ADC64mi8, 0 },
{ X86::ADC64rr, X86::ADC64mr, 0 },		{ X86::ADC64rr, X86::ADC64mr, 0 },
{ X86::ADD16ri, X86::ADD16mi, 0 },		{ X86::ADD16ri, X86::ADD16mi, 0 },
▲ Show 20 Lines • Show All 1,882 Lines • ▼ Show 20 Lines
};		};

for (X86MemoryFoldTableEntry Entry : MemoryFoldTable3) {		for (X86MemoryFoldTableEntry Entry : MemoryFoldTable3) {
AddTableEntry(RegOp2MemOpTable3, MemOp2RegOpTable,		AddTableEntry(RegOp2MemOpTable3, MemOp2RegOpTable,
Entry.RegOp, Entry.MemOp,		Entry.RegOp, Entry.MemOp,
// Index 3, folded load		// Index 3, folded load
Entry.Flags \| TB_INDEX_3 \| TB_FOLDED_LOAD);		Entry.Flags \| TB_INDEX_3 \| TB_FOLDED_LOAD);
}		}
auto I = X86InstrFMA3Info::rm_begin();
auto E = X86InstrFMA3Info::rm_end();		// Add FMA3 instructions.
for (; I != E; ++I)		for (size_t i = 0; i != array_lengthof(FMA3RegOpcodes); ++i) {
if (!I.getGroup()->isKMasked())		for (size_t j = 0; j != 3; ++j) {
AddTableEntry(RegOp2MemOpTable3, MemOp2RegOpTable,		AddTableEntry(RegOp2MemOpTable3, MemOp2RegOpTable,
I.getRegOpcode(), I.getMemOpcode(),		FMA3RegOpcodes[i].Opcodes[j], FMA3MemOpcodes[i].Opcodes[j],
		// Index 3, folded load
TB_ALIGN_NONE \| TB_INDEX_3 \| TB_FOLDED_LOAD);		TB_ALIGN_NONE \| TB_INDEX_3 \| TB_FOLDED_LOAD);
		}
		}

static const X86MemoryFoldTableEntry MemoryFoldTable4[] = {		static const X86MemoryFoldTableEntry MemoryFoldTable4[] = {
// AVX-512 foldable instructions		// AVX-512 foldable instructions
{ X86::VADDPSZrrk, X86::VADDPSZrmk, 0 },		{ X86::VADDPSZrrk, X86::VADDPSZrmk, 0 },
{ X86::VADDPDZrrk, X86::VADDPDZrmk, 0 },		{ X86::VADDPDZrrk, X86::VADDPDZrmk, 0 },
{ X86::VSUBPSZrrk, X86::VSUBPSZrmk, 0 },		{ X86::VSUBPSZrrk, X86::VSUBPSZrmk, 0 },
{ X86::VSUBPDZrrk, X86::VSUBPDZrmk, 0 },		{ X86::VSUBPDZrrk, X86::VSUBPDZrmk, 0 },
{ X86::VMULPSZrrk, X86::VMULPSZrmk, 0 },		{ X86::VMULPSZrrk, X86::VMULPSZrmk, 0 },
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines
};		};

for (X86MemoryFoldTableEntry Entry : MemoryFoldTable4) {		for (X86MemoryFoldTableEntry Entry : MemoryFoldTable4) {
AddTableEntry(RegOp2MemOpTable4, MemOp2RegOpTable,		AddTableEntry(RegOp2MemOpTable4, MemOp2RegOpTable,
Entry.RegOp, Entry.MemOp,		Entry.RegOp, Entry.MemOp,
// Index 4, folded load		// Index 4, folded load
Entry.Flags \| TB_INDEX_4 \| TB_FOLDED_LOAD);		Entry.Flags \| TB_INDEX_4 \| TB_FOLDED_LOAD);
}		}
for (I = X86InstrFMA3Info::rm_begin(); I != E; ++I)
if (I.getGroup()->isKMasked())		// Add FMA3 instructions.
		for (size_t i = 0; i != array_lengthof(FMA3RegMaskedOpcodes); ++i) {
		for (size_t j = 0; j != 3; ++j) {
AddTableEntry(RegOp2MemOpTable4, MemOp2RegOpTable,		AddTableEntry(RegOp2MemOpTable4, MemOp2RegOpTable,
I.getRegOpcode(), I.getMemOpcode(),		FMA3RegMaskedOpcodes[i].Opcodes[j],
		FMA3MemMaskedOpcodes[i].Opcodes[j],
		// Index 4, folded load
TB_ALIGN_NONE \| TB_INDEX_4 \| TB_FOLDED_LOAD);		TB_ALIGN_NONE \| TB_INDEX_4 \| TB_FOLDED_LOAD);
}		}
		}
		}

void		void
X86InstrInfo::AddTableEntry(RegOp2MemOpTableType &R2MTable,		X86InstrInfo::AddTableEntry(RegOp2MemOpTableType &R2MTable,
MemOp2RegOpTableType &M2RTable,		MemOp2RegOpTableType &M2RTable,
uint16_t RegOp, uint16_t MemOp, uint16_t Flags) {		uint16_t RegOp, uint16_t MemOp, uint16_t Flags) {
if ((Flags & TB_NO_FORWARD) == 0) {		if ((Flags & TB_NO_FORWARD) == 0) {
assert(!R2MTable.count(RegOp) && "Duplicate entry!");		assert(!R2MTable.count(RegOp) && "Duplicate entry!");
R2MTable[RegOp] = std::make_pair(MemOp, Flags);		R2MTable[RegOp] = std::make_pair(MemOp, Flags);
▲ Show 20 Lines • Show All 1,047 Lines • ▼ Show 20 Lines	X86InstrInfo::convertToThreeAddress(MachineFunction::iterator &MFI,
}		}

MFI->insert(MI.getIterator(), NewMI); // Insert the new inst		MFI->insert(MI.getIterator(), NewMI); // Insert the new inst
return NewMI;		return NewMI;
}		}

unsigned X86InstrInfo::getFMA3OpcodeToCommuteOperands(		unsigned X86InstrInfo::getFMA3OpcodeToCommuteOperands(
const MachineInstr &MI, unsigned SrcOpIdx1, unsigned SrcOpIdx2,		const MachineInstr &MI, unsigned SrcOpIdx1, unsigned SrcOpIdx2,
const X86InstrFMA3Group &FMA3Group) const {		const X86FMA3Group &FMA3Group) const {

unsigned Opc = MI.getOpcode();		uint64_t TSFlags = MI.getDesc().TSFlags;

// Put the lowest index to SrcOpIdx1 to simplify the checks below.		// Put the lowest index to SrcOpIdx1 to simplify the checks below.
if (SrcOpIdx1 > SrcOpIdx2)		if (SrcOpIdx1 > SrcOpIdx2)
std::swap(SrcOpIdx1, SrcOpIdx2);		std::swap(SrcOpIdx1, SrcOpIdx2);

// TODO: Commuting the 1st operand of FMA*_Int requires some additional		// TODO: Commuting the 1st operand of FMA*_Int requires some additional
// analysis. The commute optimization is legal only if all users of FMA*_Int		// analysis. The commute optimization is legal only if all users of FMA*_Int
// use only the lowest element of the FMA*_Int instruction. Such analysis are		// use only the lowest element of the FMA*_Int instruction. Such analysis are
// not implemented yet. So, just return 0 in that case.		// not implemented yet. So, just return 0 in that case.
// When such analysis are available this place will be the right place for		// When such analysis are available this place will be the right place for
// calling it.		// calling it.
if (FMA3Group.isIntrinsic() && SrcOpIdx1 == 1)		if (X86II::isFMA3Intrinsic(TSFlags) && SrcOpIdx1 == 1)
return 0;		return 0;

unsigned FMAOp1 = 1, FMAOp2 = 2, FMAOp3 = 3;		unsigned FMAOp1 = 1, FMAOp2 = 2, FMAOp3 = 3;
if (FMA3Group.isKMasked()) {		if (X86II::isKMasked(TSFlags)) {
// The k-mask operand cannot be commuted.		// The k-mask operand cannot be commuted.
if (SrcOpIdx1 == 2)		if (SrcOpIdx1 == 2)
return 0;		return 0;

// For k-zero-masked operations it is Ok to commute the first vector		// For k-zero-masked operations it is Ok to commute the first vector
// operand.		// operand.
// For regular k-masked operations a conservative choice is done as the		// For regular k-masked operations a conservative choice is done as the
// elements of the first vector operand, for which the corresponding bit		// elements of the first vector operand, for which the corresponding bit
// in the k-mask operand is set to 0, are copied to the result of FMA.		// in the k-mask operand is set to 0, are copied to the result of FMA.
// TODO/FIXME: The commute still may be legal if it is known that the		// TODO/FIXME: The commute still may be legal if it is known that the
// k-mask operand is set to either all ones or all zeroes.		// k-mask operand is set to either all ones or all zeroes.
// It is also Ok to commute the 1st operand if all users of MI use only		// It is also Ok to commute the 1st operand if all users of MI use only
// the elements enabled by the k-mask operand. For example,		// the elements enabled by the k-mask operand. For example,
// v4 = VFMADD213PSZrk v1, k, v2, v3; // v1[i] = k[i] ? v2[i]*v1[i]+v3[i]		// v4 = VFMADD213PSZrk v1, k, v2, v3; // v1[i] = k[i] ? v2[i]*v1[i]+v3[i]
// : v1[i];		// : v1[i];
// VMOVAPSZmrk <mem_addr>, k, v4; // this is the ONLY user of v4 ->		// VMOVAPSZmrk <mem_addr>, k, v4; // this is the ONLY user of v4 ->
// // Ok, to commute v1 in FMADD213PSZrk.		// // Ok, to commute v1 in FMADD213PSZrk.
if (FMA3Group.isKMergeMasked() && SrcOpIdx1 == FMAOp1)		if (X86II::isKMergeMasked(TSFlags) && SrcOpIdx1 == FMAOp1)
return 0;		return 0;
FMAOp2++;		FMAOp2++;
FMAOp3++;		FMAOp3++;
}		}

unsigned Case;		unsigned Case;
if (SrcOpIdx1 == FMAOp1 && SrcOpIdx2 == FMAOp2)		if (SrcOpIdx1 == FMAOp1 && SrcOpIdx2 == FMAOp2)
Case = 0;		Case = 0;
else if (SrcOpIdx1 == FMAOp1 && SrcOpIdx2 == FMAOp3)		else if (SrcOpIdx1 == FMAOp1 && SrcOpIdx2 == FMAOp3)
Case = 1;		Case = 1;
else if (SrcOpIdx1 == FMAOp2 && SrcOpIdx2 == FMAOp3)		else if (SrcOpIdx1 == FMAOp2 && SrcOpIdx2 == FMAOp3)
Case = 2;		Case = 2;
else		else
return 0;		return 0;

// Define the FMA forms mapping array that helps to map input FMA form		// Define the FMA forms mapping array that helps to map input FMA form
// to output FMA form to preserve the operation semantics after		// to output FMA form to preserve the operation semantics after
// commuting the operands.		// commuting the operands.
const unsigned Form132Index = 0;
const unsigned Form213Index = 1;
const unsigned Form231Index = 2;
static const unsigned FormMapping[][3] = {		static const unsigned FormMapping[][3] = {
// 0: SrcOpIdx1 == 1 && SrcOpIdx2 == 2;		// 0: SrcOpIdx1 == 1 && SrcOpIdx2 == 2;
// FMA132 A, C, b; ==> FMA231 C, A, b;		// FMA132 A, C, b; ==> FMA231 C, A, b;
// FMA213 B, A, c; ==> FMA213 A, B, c;		// FMA213 B, A, c; ==> FMA213 A, B, c;
// FMA231 C, A, b; ==> FMA132 A, C, b;		// FMA231 C, A, b; ==> FMA132 A, C, b;
{ Form231Index, Form213Index, Form132Index },		{ X86::FMA3Form231, X86::FMA3Form213, X86::FMA3Form132 },
// 1: SrcOpIdx1 == 1 && SrcOpIdx2 == 3;		// 1: SrcOpIdx1 == 1 && SrcOpIdx2 == 3;
// FMA132 A, c, B; ==> FMA132 B, c, A;		// FMA132 A, c, B; ==> FMA132 B, c, A;
// FMA213 B, a, C; ==> FMA231 C, a, B;		// FMA213 B, a, C; ==> FMA231 C, a, B;
// FMA231 C, a, B; ==> FMA213 B, a, C;		// FMA231 C, a, B; ==> FMA213 B, a, C;
{ Form132Index, Form231Index, Form213Index },		{ X86::FMA3Form132, X86::FMA3Form231, X86::FMA3Form213 },
// 2: SrcOpIdx1 == 2 && SrcOpIdx2 == 3;		// 2: SrcOpIdx1 == 2 && SrcOpIdx2 == 3;
// FMA132 a, C, B; ==> FMA213 a, B, C;		// FMA132 a, C, B; ==> FMA213 a, B, C;
// FMA213 b, A, C; ==> FMA132 b, C, A;		// FMA213 b, A, C; ==> FMA132 b, C, A;
// FMA231 c, A, B; ==> FMA231 c, B, A;		// FMA231 c, A, B; ==> FMA231 c, B, A;
{ Form213Index, Form132Index, Form231Index }		{ X86::FMA3Form213, X86::FMA3Form132, X86::FMA3Form231 }
};		};

unsigned FMAForms[3];
if (FMA3Group.isRegOpcodeFromGroup(Opc)) {
FMAForms[0] = FMA3Group.getReg132Opcode();
FMAForms[1] = FMA3Group.getReg213Opcode();
FMAForms[2] = FMA3Group.getReg231Opcode();
} else {
FMAForms[0] = FMA3Group.getMem132Opcode();
FMAForms[1] = FMA3Group.getMem213Opcode();
FMAForms[2] = FMA3Group.getMem231Opcode();
}
unsigned FormIndex;
for (FormIndex = 0; FormIndex < 3; FormIndex++)
if (Opc == FMAForms[FormIndex])
break;

// Everything is ready, just adjust the FMA opcode and return it.		// Everything is ready, just adjust the FMA opcode and return it.
FormIndex = FormMapping[Case][FormIndex];		unsigned FormIndex = FormMapping[Case][X86II::getFMA3Form(TSFlags)];
return FMAForms[FormIndex];		return FMA3Group.Opcodes[FormIndex];
}		}

MachineInstr *X86InstrInfo::commuteInstructionImpl(MachineInstr &MI, bool NewMI,		MachineInstr *X86InstrInfo::commuteInstructionImpl(MachineInstr &MI, bool NewMI,
unsigned OpIdx1,		unsigned OpIdx1,
unsigned OpIdx2) const {		unsigned OpIdx2) const {
auto cloneIfNew = [NewMI](MachineInstr &MI) -> MachineInstr & {		auto cloneIfNew = [NewMI](MachineInstr &MI) -> MachineInstr & {
if (NewMI)		if (NewMI)
return *MI.getParent()->getParent()->CloneMachineInstr(&MI);		return *MI.getParent()->getParent()->CloneMachineInstr(&MI);
▲ Show 20 Lines • Show All 208 Lines • ▼ Show 20 Lines	case X86::CMOVNO16rr: case X86::CMOVNO32rr: case X86::CMOVNO64rr: {
case X86::CMOVNO64rr: Opc = X86::CMOVO64rr; break;		case X86::CMOVNO64rr: Opc = X86::CMOVO64rr; break;
}		}
auto &WorkingMI = cloneIfNew(MI);		auto &WorkingMI = cloneIfNew(MI);
WorkingMI.setDesc(get(Opc));		WorkingMI.setDesc(get(Opc));
return TargetInstrInfo::commuteInstructionImpl(WorkingMI, /NewMI=/false,		return TargetInstrInfo::commuteInstructionImpl(WorkingMI, /NewMI=/false,
OpIdx1, OpIdx2);		OpIdx1, OpIdx2);
}		}
default:		default:
const X86InstrFMA3Group *FMA3Group =		if (const X86FMA3Group *FMA3Group = getFMA3Group(MI.getOpcode(),
X86InstrFMA3Info::getFMA3Group(MI.getOpcode());		MI.getDesc().TSFlags)) {
if (FMA3Group) {
unsigned Opc =		unsigned Opc =
getFMA3OpcodeToCommuteOperands(MI, OpIdx1, OpIdx2, *FMA3Group);		getFMA3OpcodeToCommuteOperands(MI, OpIdx1, OpIdx2, *FMA3Group);
if (Opc == 0)		if (Opc == 0)
return nullptr;		return nullptr;
auto &WorkingMI = cloneIfNew(MI);		auto &WorkingMI = cloneIfNew(MI);
WorkingMI.setDesc(get(Opc));		WorkingMI.setDesc(get(Opc));
return TargetInstrInfo::commuteInstructionImpl(WorkingMI, /NewMI=/false,		return TargetInstrInfo::commuteInstructionImpl(WorkingMI, /NewMI=/false,
OpIdx1, OpIdx2);		OpIdx1, OpIdx2);
}		}

return TargetInstrInfo::commuteInstructionImpl(MI, NewMI, OpIdx1, OpIdx2);		return TargetInstrInfo::commuteInstructionImpl(MI, NewMI, OpIdx1, OpIdx2);
}		}
}		}

bool X86InstrInfo::findFMA3CommutedOpIndices(		bool
const MachineInstr &MI, unsigned &SrcOpIdx1, unsigned &SrcOpIdx2,		X86InstrInfo::findFMA3CommutedOpIndices(const MachineInstr &MI,
const X86InstrFMA3Group &FMA3Group) const {		unsigned &SrcOpIdx1,
		unsigned &SrcOpIdx2,
		const X86FMA3Group &FMA3Group) const {
		uint64_t TSFlags = MI.getDesc().TSFlags;
unsigned FirstCommutableVecOp = 1;		unsigned FirstCommutableVecOp = 1;
unsigned LastCommutableVecOp = 3;		unsigned LastCommutableVecOp = 3;
unsigned KMaskOp = 0;		unsigned KMaskOp = 0;
if (FMA3Group.isKMasked()) {		if (TSFlags & X86II::EVEX_K) {
// The k-mask operand has index = 2 for masked and zero-masked operations.		// The k-mask operand has index = 2 for masked and zero-masked operations.
KMaskOp = 2;		KMaskOp = 2;

// The operand with index = 1 is used as a source for those elements for		// The operand with index = 1 is used as a source for those elements for
// which the corresponding bit in the k-mask is set to 0.		// which the corresponding bit in the k-mask is set to 0.
if (FMA3Group.isKMergeMasked())		if (!(TSFlags & X86II::EVEX_Z))
FirstCommutableVecOp = 3;		FirstCommutableVecOp = 3;

LastCommutableVecOp++;		LastCommutableVecOp++;
}		}

if (isMem(MI, LastCommutableVecOp))		if (isMem(MI, LastCommutableVecOp))
LastCommutableVecOp--;		LastCommutableVecOp--;

▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	case X86::VCMPPSYrri: {
case 0x07: // ORDERED		case 0x07: // ORDERED
// The indices of the commutable operands are 1 and 2.		// The indices of the commutable operands are 1 and 2.
// Assign them to the returned operand indices here.		// Assign them to the returned operand indices here.
return fixCommutedOpIndices(SrcOpIdx1, SrcOpIdx2, 1, 2);		return fixCommutedOpIndices(SrcOpIdx1, SrcOpIdx2, 1, 2);
}		}
return false;		return false;
}		}
default:		default:
const X86InstrFMA3Group *FMA3Group =		if (const X86FMA3Group *FMA3Group = getFMA3Group(MI.getOpcode(),
X86InstrFMA3Info::getFMA3Group(MI.getOpcode());		MI.getDesc().TSFlags))
if (FMA3Group)
return findFMA3CommutedOpIndices(MI, SrcOpIdx1, SrcOpIdx2, *FMA3Group);		return findFMA3CommutedOpIndices(MI, SrcOpIdx1, SrcOpIdx2, *FMA3Group);
return TargetInstrInfo::findCommutedOpIndices(MI, SrcOpIdx1, SrcOpIdx2);		return TargetInstrInfo::findCommutedOpIndices(MI, SrcOpIdx1, SrcOpIdx2);
}		}
return false;		return false;
}		}

static X86::CondCode getCondFromBranchOpc(unsigned BrOpc) {		static X86::CondCode getCondFromBranchOpc(unsigned BrOpc) {
switch (BrOpc) {		switch (BrOpc) {
▲ Show 20 Lines • Show All 4,552 Lines • Show Last 20 Lines