This is an archive of the discontinued LLVM Phabricator instance.

[mips][msa] Implement f16 support
ClosedPublic

Authored by sdardis on Nov 8 2016, 7:31 AM.

Download Raw Diff

Details

Reviewers

vkalintiris
zoran.jovanovic

Commits

rG0e2ee3b4b983: [mips][msa] Implement f16 support
rL287349: [mips][msa] Implement f16 support

Summary

The MIPS MSA ASE provides instructions to convert to and from half precision
floating point. This patch teaches the MIPS backend to treat f16 as a legal
type and how to promote such values to f32 for the usual set of operations.

As a result of this, the fexup[lr].w intrinsics no longer crash LLVM during
type legalization.

Diff Detail

Repository: rL LLVM

Event Timeline

sdardis updated this revision to Diff 77192.Nov 8 2016, 7:31 AM

sdardis retitled this revision from to [mips][msa] Implement f16 support.

sdardis updated this object.

sdardis added a reviewer: vkalintiris.

sdardis set the repository for this revision to rL LLVM.

sdardis added a subscriber: llvm-commits.

Herald edited edge metadata. · View Herald TranscriptNov 8 2016, 7:31 AM

zoran.jovanovic added a subscriber: zoran.jovanovic.Nov 16 2016, 7:33 AM

zoran.jovanovic added inline comments.

lib/Target/Mips/MipsMSAInstrInfo.td
3747 ↗	(On Diff #77192)	Nit: Next two lines exceed 80 characters.
3752 ↗	(On Diff #77192)	Nit: Next two lines exceed 80 characters.
3757 ↗	(On Diff #77192)	Nit: Next two lines exceed 80 characters.
3762 ↗	(On Diff #77192)	Nit: Next two lines exceed 80 characters.
lib/Target/Mips/MipsSEISelLowering.cpp
3448 ↗	(On Diff #77192)	Nit: I do believe that Mips::GPR64RegClass and 64-bit variants of instructions should be used with N32 too. Same is for code in MipsSETargetLowering::emitLD_F16_PSEUDO.
3655 ↗	(On Diff #77192)	Nit: Is this comment correct? From the code below (and corresponding test cases) it seems that fexupr.d instruction should be generated after this one.
3689 ↗	(On Diff #77192)	if comment above is correct IsFGR64 should be replaced with IsFGR64onMips64.

LGTM otherwise.

This revision is now accepted and ready to land.Nov 16 2016, 7:34 AM

As I mention in my inline comments, I'm revising this patch. I'll highlight the differences when I repost.

lib/Target/Mips/MipsSEISelLowering.cpp
3448 ↗	(On Diff #77192)	Looking at this, I have found test cases where N32 requires either GPR32RegClass or GPR64RegClass depending on the context. I'll post a revised version of this patch soon. I've also found 1 or 2 places where the wrong MSA regclass is used.
3655 ↗	(On Diff #77192)	Yes, you're correct, I missed that when writing the comment. Probably a copy/paste thing.
3689 ↗	(On Diff #77192)	(For completeness' sake) The comment above is wrong, If we're expanding an f16 to FPR64, we need FEXUPRD as well. The case of Mips32/Mips64 doesn't apply here.

Updated the handling of loads and stores for f16 to better match the register class that is actually used. Added test coverage for that area. We're still unable to enable the machine verifier for this, as some MSA instructions are missing their 64 bit counterparts (fill, copy_[su].[bhwd] relating to this patch).

This revision is now accepted and ready to land.Nov 17 2016, 4:52 AM

Herald edited edge metadata. · View Herald TranscriptNov 17 2016, 4:52 AM

Fix usage of MSA register class for f16 to FGR64 expansion.

Herald edited edge metadata. · View Herald TranscriptNov 17 2016, 4:59 AM

sdardis added inline comments.Nov 17 2016, 5:05 AM

lib/Target/Mips/MipsSEISelLowering.cpp
3448 ↗	(On Diff #77192)	I've updated this use the register class of the operand if it's a register, otherwise use GPR64 if it's not O32. Likewise for LD_F16.
test/CodeGen/Mips/msa/f16-llvm-ir.ll
24–62 ↗	(On Diff #78350)	New test coverage for testing frame indexed accesses.

These changes OK?

LGTM.

Closed by commit rL287349: [mips][msa] Implement f16 support (authored by sdardis). · Explain WhyNov 18 2016, 8:27 AM

This revision was automatically updated to reflect the committed changes.

For cross referencing purposes, the tests included in the first diff should have been part of the last diff. They have been committed in rL287574.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

Mips/

MipsMSAInstrInfo.td

50 lines

MipsRegisterInfo.td

6 lines

MipsSEISelLowering.h

14 lines

MipsSEISelLowering.cpp

348 lines

Diff 78536

llvm/trunk/lib/Target/Mips/MipsMSAInstrInfo.td

Show First 20 Lines • Show All 3,725 Lines • ▼ Show 20 Lines	def SZ_H_PSEUDO : MSA_CBRANCH_PSEUDO_DESC_BASE<MipsVAllZero, v8i16,
MSA128H, NoItinerary>;		MSA128H, NoItinerary>;
def SZ_W_PSEUDO : MSA_CBRANCH_PSEUDO_DESC_BASE<MipsVAllZero, v4i32,		def SZ_W_PSEUDO : MSA_CBRANCH_PSEUDO_DESC_BASE<MipsVAllZero, v4i32,
MSA128W, NoItinerary>;		MSA128W, NoItinerary>;
def SZ_D_PSEUDO : MSA_CBRANCH_PSEUDO_DESC_BASE<MipsVAllZero, v2i64,		def SZ_D_PSEUDO : MSA_CBRANCH_PSEUDO_DESC_BASE<MipsVAllZero, v2i64,
MSA128D, NoItinerary>;		MSA128D, NoItinerary>;
def SZ_V_PSEUDO : MSA_CBRANCH_PSEUDO_DESC_BASE<MipsVAnyZero, v16i8,		def SZ_V_PSEUDO : MSA_CBRANCH_PSEUDO_DESC_BASE<MipsVAnyZero, v16i8,
MSA128B, NoItinerary>;		MSA128B, NoItinerary>;

		// Pseudoes used to implement transparent fp16 support.

		let Predicates = [HasMSA] in {
		def ST_F16 : MipsPseudo<(outs), (ins MSA128F16:$ws, mem_simm10:$addr),
		[(store (f16 MSA128F16:$ws), (addrimm10:$addr))]> {
		let usesCustomInserter = 1;
		}

		def LD_F16 : MipsPseudo<(outs MSA128F16:$ws), (ins mem_simm10:$addr),
		[(set MSA128F16:$ws, (f16 (load addrimm10:$addr)))]> {
		let usesCustomInserter = 1;
		}

		def MSA_FP_EXTEND_W_PSEUDO : MipsPseudo<(outs FGR32Opnd:$fd),
		(ins MSA128F16:$ws),
		[(set FGR32Opnd:$fd,
		(f32 (fpextend MSA128F16:$ws)))]> {
		let usesCustomInserter = 1;
		}

		def MSA_FP_ROUND_W_PSEUDO : MipsPseudo<(outs MSA128F16:$wd),
		(ins FGR32Opnd:$fs),
		[(set MSA128F16:$wd,
		(f16 (fpround FGR32Opnd:$fs)))]> {
		let usesCustomInserter = 1;
		}

		def MSA_FP_EXTEND_D_PSEUDO : MipsPseudo<(outs FGR64Opnd:$fd),
		(ins MSA128F16:$ws),
		[(set FGR64Opnd:$fd,
		(f64 (fpextend MSA128F16:$ws)))]> {
		let usesCustomInserter = 1;
		}

		def MSA_FP_ROUND_D_PSEUDO : MipsPseudo<(outs MSA128F16:$wd),
		(ins FGR64Opnd:$fs),
		[(set MSA128F16:$wd,
		(f16 (fpround FGR64Opnd:$fs)))]> {
		let usesCustomInserter = 1;
		}

		def : MipsPat<(MipsTruncIntFP MSA128F16:$ws),
		(TRUNC_W_D64 (MSA_FP_EXTEND_D_PSEUDO MSA128F16:$ws))>;

		def : MipsPat<(MipsFPCmp MSA128F16:$ws, MSA128F16:$wt, imm:$cond),
		(FCMP_S32 (MSA_FP_EXTEND_W_PSEUDO MSA128F16:$ws),
		(MSA_FP_EXTEND_W_PSEUDO MSA128F16:$wt), imm:$cond)>,
		ISA_MIPS1_NOT_32R6_64R6;
		}

// Vector extraction with fixed index.		// Vector extraction with fixed index.
//		//
// Extracting 32-bit values on MSA32 should always use COPY_S_W rather than		// Extracting 32-bit values on MSA32 should always use COPY_S_W rather than
// COPY_U_W, even for the zero-extended case. This is because our forward		// COPY_U_W, even for the zero-extended case. This is because our forward
// compatibility strategy is to consider registers to be infinitely		// compatibility strategy is to consider registers to be infinitely
// sign-extended so that a MIPS64 can execute MIPS32 code without getting		// sign-extended so that a MIPS64 can execute MIPS32 code without getting
// different register values.		// different register values.
def : MSAPat<(vextract_zext_i32 (v4i32 MSA128W:$ws), immZExt2Ptr:$idx),		def : MSAPat<(vextract_zext_i32 (v4i32 MSA128W:$ws), immZExt2Ptr:$idx),
▲ Show 20 Lines • Show All 155 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/Mips/MipsRegisterInfo.td

	Show First 20 Lines • Show All 394 Lines • ▼ Show 20 Lines
	// FP condition code registers.			// FP condition code registers.
	def FCC : RegisterClass<"Mips", [i32], 32, (sequence "FCC%u", 0, 7)>,			def FCC : RegisterClass<"Mips", [i32], 32, (sequence "FCC%u", 0, 7)>,
	Unallocatable;			Unallocatable;

	// MIPS32r6/MIPS64r6 store FPU condition codes in normal FGR registers.			// MIPS32r6/MIPS64r6 store FPU condition codes in normal FGR registers.
	// This class allows us to represent this in codegen patterns.			// This class allows us to represent this in codegen patterns.
	def FGRCC : RegisterClass<"Mips", [i32], 32, (sequence "F%u", 0, 31)>;			def FGRCC : RegisterClass<"Mips", [i32], 32, (sequence "F%u", 0, 31)>;

				def MSA128F16 : RegisterClass<"Mips", [f16], 128, (sequence "W%u", 0, 31)>;

	def MSA128B: RegisterClass<"Mips", [v16i8], 128,			def MSA128B: RegisterClass<"Mips", [v16i8], 128,
	(sequence "W%u", 0, 31)>;			(sequence "W%u", 0, 31)>;
	def MSA128H: RegisterClass<"Mips", [v8i16, v8f16], 128,			def MSA128H: RegisterClass<"Mips", [v8i16, v8f16], 128,
	(sequence "W%u", 0, 31)>;			(sequence "W%u", 0, 31)>;
	def MSA128W: RegisterClass<"Mips", [v4i32, v4f32], 128,			def MSA128W: RegisterClass<"Mips", [v4i32, v4f32], 128,
	(sequence "W%u", 0, 31)>;			(sequence "W%u", 0, 31)>;
	def MSA128D: RegisterClass<"Mips", [v2i64, v2f64], 128,			def MSA128D: RegisterClass<"Mips", [v2i64, v2f64], 128,
	(sequence "W%u", 0, 31)>;			(sequence "W%u", 0, 31)>;
	▲ Show 20 Lines • Show All 230 Lines • ▼ Show 20 Lines
	def COP2Opnd : RegisterOperand<COP2> {			def COP2Opnd : RegisterOperand<COP2> {
	let ParserMatchClass = COP2AsmOperand;			let ParserMatchClass = COP2AsmOperand;
	}			}

	def COP3Opnd : RegisterOperand<COP3> {			def COP3Opnd : RegisterOperand<COP3> {
	let ParserMatchClass = COP3AsmOperand;			let ParserMatchClass = COP3AsmOperand;
	}			}

				def MSA128F16Opnd : RegisterOperand<MSA128F16> {
				let ParserMatchClass = MSA128AsmOperand;
				}

	def MSA128BOpnd : RegisterOperand<MSA128B> {			def MSA128BOpnd : RegisterOperand<MSA128B> {
	let ParserMatchClass = MSA128AsmOperand;			let ParserMatchClass = MSA128AsmOperand;
	}			}

	def MSA128HOpnd : RegisterOperand<MSA128H> {			def MSA128HOpnd : RegisterOperand<MSA128H> {
	let ParserMatchClass = MSA128AsmOperand;			let ParserMatchClass = MSA128AsmOperand;
	}			}

	Show All 11 Lines

llvm/trunk/lib/Target/Mips/MipsSEISelLowering.h

Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	private:
MachineBasicBlock *emitFILL_FD(MachineInstr &MI,		MachineBasicBlock *emitFILL_FD(MachineInstr &MI,
MachineBasicBlock *BB) const;		MachineBasicBlock *BB) const;
/// \brief Emit the FEXP2_W_1 pseudo instructions.		/// \brief Emit the FEXP2_W_1 pseudo instructions.
MachineBasicBlock *emitFEXP2_W_1(MachineInstr &MI,		MachineBasicBlock *emitFEXP2_W_1(MachineInstr &MI,
MachineBasicBlock *BB) const;		MachineBasicBlock *BB) const;
/// \brief Emit the FEXP2_D_1 pseudo instructions.		/// \brief Emit the FEXP2_D_1 pseudo instructions.
MachineBasicBlock *emitFEXP2_D_1(MachineInstr &MI,		MachineBasicBlock *emitFEXP2_D_1(MachineInstr &MI,
MachineBasicBlock *BB) const;		MachineBasicBlock *BB) const;
		/// \brief Emit the FILL_FW pseudo instruction
		MachineBasicBlock *emitLD_F16_PSEUDO(MachineInstr &MI,
		MachineBasicBlock *BB) const;
		/// \brief Emit the FILL_FD pseudo instruction
		MachineBasicBlock *emitST_F16_PSEUDO(MachineInstr &MI,
		MachineBasicBlock *BB) const;
		/// \brief Emit the FEXP2_W_1 pseudo instructions.
		MachineBasicBlock *emitFPEXTEND_PSEUDO(MachineInstr &MI,
		MachineBasicBlock *BB,
		bool IsFGR64) const;
		/// \brief Emit the FEXP2_D_1 pseudo instructions.
		MachineBasicBlock *emitFPROUND_PSEUDO(MachineInstr &MI,
		MachineBasicBlock *BBi,
		bool IsFGR64) const;
};		};
}		}

#endif		#endif

llvm/trunk/lib/Target/Mips/MipsSEISelLowering.cpp

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	if (Subtarget.hasMSA()) {
addMSAIntType(MVT::v16i8, &Mips::MSA128BRegClass);		addMSAIntType(MVT::v16i8, &Mips::MSA128BRegClass);
addMSAIntType(MVT::v8i16, &Mips::MSA128HRegClass);		addMSAIntType(MVT::v8i16, &Mips::MSA128HRegClass);
addMSAIntType(MVT::v4i32, &Mips::MSA128WRegClass);		addMSAIntType(MVT::v4i32, &Mips::MSA128WRegClass);
addMSAIntType(MVT::v2i64, &Mips::MSA128DRegClass);		addMSAIntType(MVT::v2i64, &Mips::MSA128DRegClass);
addMSAFloatType(MVT::v8f16, &Mips::MSA128HRegClass);		addMSAFloatType(MVT::v8f16, &Mips::MSA128HRegClass);
addMSAFloatType(MVT::v4f32, &Mips::MSA128WRegClass);		addMSAFloatType(MVT::v4f32, &Mips::MSA128WRegClass);
addMSAFloatType(MVT::v2f64, &Mips::MSA128DRegClass);		addMSAFloatType(MVT::v2f64, &Mips::MSA128DRegClass);

		// f16 is a storage-only type, always promote it to f32.
		addRegisterClass(MVT::f16, &Mips::MSA128HRegClass);
		setOperationAction(ISD::SETCC, MVT::f16, Promote);
		setOperationAction(ISD::BR_CC, MVT::f16, Promote);
		setOperationAction(ISD::SELECT_CC, MVT::f16, Promote);
		setOperationAction(ISD::SELECT, MVT::f16, Promote);
		setOperationAction(ISD::FADD, MVT::f16, Promote);
		setOperationAction(ISD::FSUB, MVT::f16, Promote);
		setOperationAction(ISD::FMUL, MVT::f16, Promote);
		setOperationAction(ISD::FDIV, MVT::f16, Promote);
		setOperationAction(ISD::FREM, MVT::f16, Promote);
		setOperationAction(ISD::FMA, MVT::f16, Promote);
		setOperationAction(ISD::FNEG, MVT::f16, Promote);
		setOperationAction(ISD::FABS, MVT::f16, Promote);
		setOperationAction(ISD::FCEIL, MVT::f16, Promote);
		setOperationAction(ISD::FCOPYSIGN, MVT::f16, Promote);
		setOperationAction(ISD::FCOS, MVT::f16, Promote);
		setOperationAction(ISD::FP_EXTEND, MVT::f16, Promote);
		setOperationAction(ISD::FFLOOR, MVT::f16, Promote);
		setOperationAction(ISD::FNEARBYINT, MVT::f16, Promote);
		setOperationAction(ISD::FPOW, MVT::f16, Promote);
		setOperationAction(ISD::FPOWI, MVT::f16, Promote);
		setOperationAction(ISD::FRINT, MVT::f16, Promote);
		setOperationAction(ISD::FSIN, MVT::f16, Promote);
		setOperationAction(ISD::FSINCOS, MVT::f16, Promote);
		setOperationAction(ISD::FSQRT, MVT::f16, Promote);
		setOperationAction(ISD::FEXP, MVT::f16, Promote);
		setOperationAction(ISD::FEXP2, MVT::f16, Promote);
		setOperationAction(ISD::FLOG, MVT::f16, Promote);
		setOperationAction(ISD::FLOG2, MVT::f16, Promote);
		setOperationAction(ISD::FLOG10, MVT::f16, Promote);
		setOperationAction(ISD::FROUND, MVT::f16, Promote);
		setOperationAction(ISD::FTRUNC, MVT::f16, Promote);
		setOperationAction(ISD::FMINNUM, MVT::f16, Promote);
		setOperationAction(ISD::FMAXNUM, MVT::f16, Promote);
		setOperationAction(ISD::FMINNAN, MVT::f16, Promote);
		setOperationAction(ISD::FMAXNAN, MVT::f16, Promote);

setTargetDAGCombine(ISD::AND);		setTargetDAGCombine(ISD::AND);
setTargetDAGCombine(ISD::OR);		setTargetDAGCombine(ISD::OR);
setTargetDAGCombine(ISD::SRA);		setTargetDAGCombine(ISD::SRA);
setTargetDAGCombine(ISD::VSELECT);		setTargetDAGCombine(ISD::VSELECT);
setTargetDAGCombine(ISD::XOR);		setTargetDAGCombine(ISD::XOR);
}		}

if (!Subtarget.useSoftFloat()) {		if (!Subtarget.useSoftFloat()) {
▲ Show 20 Lines • Show All 1,064 Lines • ▼ Show 20 Lines	MipsSETargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
case Mips::FILL_FW_PSEUDO:		case Mips::FILL_FW_PSEUDO:
return emitFILL_FW(MI, BB);		return emitFILL_FW(MI, BB);
case Mips::FILL_FD_PSEUDO:		case Mips::FILL_FD_PSEUDO:
return emitFILL_FD(MI, BB);		return emitFILL_FD(MI, BB);
case Mips::FEXP2_W_1_PSEUDO:		case Mips::FEXP2_W_1_PSEUDO:
return emitFEXP2_W_1(MI, BB);		return emitFEXP2_W_1(MI, BB);
case Mips::FEXP2_D_1_PSEUDO:		case Mips::FEXP2_D_1_PSEUDO:
return emitFEXP2_D_1(MI, BB);		return emitFEXP2_D_1(MI, BB);
		case Mips::ST_F16:
		return emitST_F16_PSEUDO(MI, BB);
		case Mips::LD_F16:
		return emitLD_F16_PSEUDO(MI, BB);
		case Mips::MSA_FP_EXTEND_W_PSEUDO:
		return emitFPEXTEND_PSEUDO(MI, BB, false);
		case Mips::MSA_FP_ROUND_W_PSEUDO:
		return emitFPROUND_PSEUDO(MI, BB, false);
		case Mips::MSA_FP_EXTEND_D_PSEUDO:
		return emitFPEXTEND_PSEUDO(MI, BB, true);
		case Mips::MSA_FP_ROUND_D_PSEUDO:
		return emitFPROUND_PSEUDO(MI, BB, true);
}		}
}		}

bool MipsSETargetLowering::isEligibleForTailCallOptimization(		bool MipsSETargetLowering::isEligibleForTailCallOptimization(
const CCState &CCInfo, unsigned NextStackOffset,		const CCState &CCInfo, unsigned NextStackOffset,
const MipsFunctionInfo &FI) const {		const MipsFunctionInfo &FI) const {
if (!UseMipsTailCalls)		if (!UseMipsTailCalls)
return false;		return false;
▲ Show 20 Lines • Show All 2,184 Lines • ▼ Show 20 Lines	BuildMI(*BB, MI, DL, TII->get(Mips::INSERT_SUBREG), Wt2)
.addReg(Fs)		.addReg(Fs)
.addImm(Mips::sub_64);		.addImm(Mips::sub_64);
BuildMI(*BB, MI, DL, TII->get(Mips::SPLATI_D), Wd).addReg(Wt2).addImm(0);		BuildMI(*BB, MI, DL, TII->get(Mips::SPLATI_D), Wd).addReg(Wt2).addImm(0);

MI.eraseFromParent(); // The pseudo instruction is gone now.		MI.eraseFromParent(); // The pseudo instruction is gone now.
return BB;		return BB;
}		}

		// Emit the ST_F16_PSEDUO instruction to store a f16 value from an MSA
		// register.
		//
		// STF16 MSA128F16:$wd, mem_simm10:$addr
		// =>
		// copy_u.h $rtemp,$wd[0]
		// sh $rtemp, $addr
		//
		// Safety: We can't use st.h & co as they would over write the memory after
		// the destination. It would require half floats be allocated 16 bytes(!) of
		// space.
		MachineBasicBlock *
		MipsSETargetLowering::emitST_F16_PSEUDO(MachineInstr &MI,
		MachineBasicBlock *BB) const {

		const TargetInstrInfo *TII = Subtarget.getInstrInfo();
		MachineRegisterInfo &RegInfo = BB->getParent()->getRegInfo();
		DebugLoc DL = MI.getDebugLoc();
		unsigned Ws = MI.getOperand(0).getReg();
		unsigned Rt = MI.getOperand(1).getReg();
		const MachineMemOperand &MMO = **MI.memoperands_begin();
		unsigned Imm = MMO.getOffset();

		// Caution: A load via the GOT can expand to a GPR32 operand, a load via
		// spill and reload can expand as a GPR64 operand. Examine the
		// operand in detail and default to ABI.
		const TargetRegisterClass *RC =
		MI.getOperand(1).isReg() ? RegInfo.getRegClass(MI.getOperand(1).getReg())
		: (Subtarget.isABI_O32() ? &Mips::GPR32RegClass
		: &Mips::GPR64RegClass);
		const bool UsingMips32 = RC == &Mips::GPR32RegClass;
		unsigned Rs = RegInfo.createVirtualRegister(RC);

		BuildMI(*BB, MI, DL, TII->get(Mips::COPY_U_H), Rs).addReg(Ws).addImm(0);
		BuildMI(*BB, MI, DL, TII->get(UsingMips32 ? Mips::SH : Mips::SH64))
		.addReg(Rs)
		.addReg(Rt)
		.addImm(Imm)
		.addMemOperand(BB->getParent()->getMachineMemOperand(
		&MMO, MMO.getOffset(), MMO.getSize()));

		MI.eraseFromParent();
		return BB;
		}

		// Emit the LD_F16_PSEDUO instruction to load a f16 value into an MSA register.
		//
		// LD_F16 MSA128F16:$wd, mem_simm10:$addr
		// =>
		// lh $rtemp, $addr
		// fill.h $wd, $rtemp
		//
		// Safety: We can't use ld.h & co as they over-read from the source.
		// Additionally, if the address is not modulo 16, 2 cases can occur:
		// a) Segmentation fault as the load instruction reads from a memory page
		// memory it's not supposed to.
		// b) The load crosses an implementation specific boundary, requiring OS
		// intervention.
		//
		MachineBasicBlock *
		MipsSETargetLowering::emitLD_F16_PSEUDO(MachineInstr &MI,
		MachineBasicBlock *BB) const {

		const TargetInstrInfo *TII = Subtarget.getInstrInfo();
		MachineRegisterInfo &RegInfo = BB->getParent()->getRegInfo();
		DebugLoc DL = MI.getDebugLoc();
		unsigned Wd = MI.getOperand(0).getReg();

		// Caution: A load via the GOT can expand to a GPR32 operand, a load via
		// spill and reload can expand as a GPR64 operand. Examine the
		// operand in detail and default to ABI.
		const TargetRegisterClass *RC =
		MI.getOperand(1).isReg() ? RegInfo.getRegClass(MI.getOperand(1).getReg())
		: (Subtarget.isABI_O32() ? &Mips::GPR32RegClass
		: &Mips::GPR64RegClass);

		const bool UsingMips32 = RC == &Mips::GPR32RegClass;
		unsigned Rt = RegInfo.createVirtualRegister(RC);

		MachineInstrBuilder MIB =
		BuildMI(*BB, MI, DL, TII->get(UsingMips32 ? Mips::LH : Mips::LH64), Rt);
		for (unsigned i = 1; i < MI.getNumOperands(); i++)
		MIB.addOperand(MI.getOperand(i));

		BuildMI(*BB, MI, DL, TII->get(Mips::FILL_H), Wd).addReg(Rt);

		MI.eraseFromParent();
		return BB;
		}

		// Emit the FPROUND_PSEUDO instruction.
		//
		// Round an FGR64Opnd, FGR32Opnd to an f16.
		//
		// Safety: Cycle the operand through the GPRs so the result always ends up
		// the correct MSA register.
		//
		// FIXME: This copying is strictly unnecessary. If we could tie FGR32Opnd:$Fs
		// / FGR64Opnd:$Fs and MSA128F16:$Wd to the same physical register
		// (which they can be, as the MSA registers are defined to alias the
		// FPU's 64 bit and 32 bit registers) the result can be accessed using
		// the correct register class. That requires operands be tie-able across
		// register classes which have a sub/super register class relationship.
		//
		// For FPG32Opnd:
		//
		// FPROUND MSA128F16:$wd, FGR32Opnd:$fs
		// =>
		// mfc1 $rtemp, $fs
		// fill.w $rtemp, $wtemp
		// fexdo.w $wd, $wtemp, $wtemp
		//
		// For FPG64Opnd on mips32r2+:
		//
		// FPROUND MSA128F16:$wd, FGR64Opnd:$fs
		// =>
		// mfc1 $rtemp, $fs
		// fill.w $rtemp, $wtemp
		// mfhc1 $rtemp2, $fs
		// insert.w $wtemp[1], $rtemp2
		// insert.w $wtemp[3], $rtemp2
		// fexdo.w $wtemp2, $wtemp, $wtemp
		// fexdo.h $wd, $temp2, $temp2
		//
		// For FGR64Opnd on mips64r2+:
		//
		// FPROUND MSA128F16:$wd, FGR64Opnd:$fs
		// =>
		// dmfc1 $rtemp, $fs
		// fill.d $rtemp, $wtemp
		// fexdo.w $wtemp2, $wtemp, $wtemp
		// fexdo.h $wd, $wtemp2, $wtemp2
		//
		// Safety note: As $wtemp is UNDEF, we may provoke a spurious exception if the
		// undef bits are "just right" and the exception enable bits are
		// set. By using fill.w to replicate $fs into all elements over
		// insert.w for one element, we avoid that potiential case. If
		// fexdo.[hw] causes an exception in, the exception is valid and it
		// occurs for all elements.
		//
		MachineBasicBlock *
		MipsSETargetLowering::emitFPROUND_PSEUDO(MachineInstr &MI,
		MachineBasicBlock *BB,
		bool IsFGR64) const {

		// Strictly speaking, we need MIPS32R5 to support MSA. We'll be generous
		// here. It's technically doable to support MIPS32 here, but the ISA forbids
		// it.
		assert(Subtarget.hasMSA() && Subtarget.hasMips32r2());

		bool IsFGR64onMips64 = Subtarget.hasMips64() && IsFGR64;

		const TargetInstrInfo *TII = Subtarget.getInstrInfo();
		DebugLoc DL = MI.getDebugLoc();
		unsigned Wd = MI.getOperand(0).getReg();
		unsigned Fs = MI.getOperand(1).getReg();

		MachineRegisterInfo &RegInfo = BB->getParent()->getRegInfo();
		unsigned Wtemp = RegInfo.createVirtualRegister(&Mips::MSA128WRegClass);
		const TargetRegisterClass *GPRRC =
		IsFGR64onMips64 ? &Mips::GPR64RegClass : &Mips::GPR32RegClass;
		unsigned MFC1Opc = IsFGR64onMips64 ? Mips::DMFC1 : Mips::MFC1;
		unsigned FILLOpc = IsFGR64onMips64 ? Mips::FILL_D : Mips::FILL_W;

		// Perform the register class copy as mentioned above.
		unsigned Rtemp = RegInfo.createVirtualRegister(GPRRC);
		BuildMI(*BB, MI, DL, TII->get(MFC1Opc), Rtemp).addReg(Fs);
		BuildMI(*BB, MI, DL, TII->get(FILLOpc), Wtemp).addReg(Rtemp);
		unsigned WPHI = Wtemp;

		if (!Subtarget.hasMips64() && IsFGR64) {
		unsigned Rtemp2 = RegInfo.createVirtualRegister(GPRRC);
		BuildMI(*BB, MI, DL, TII->get(Mips::MFHC1_D64), Rtemp2).addReg(Fs);
		unsigned Wtemp2 = RegInfo.createVirtualRegister(&Mips::MSA128WRegClass);
		unsigned Wtemp3 = RegInfo.createVirtualRegister(&Mips::MSA128WRegClass);
		BuildMI(*BB, MI, DL, TII->get(Mips::INSERT_W), Wtemp2)
		.addReg(Wtemp)
		.addReg(Rtemp2)
		.addImm(1);
		BuildMI(*BB, MI, DL, TII->get(Mips::INSERT_W), Wtemp3)
		.addReg(Wtemp2)
		.addReg(Rtemp2)
		.addImm(3);
		WPHI = Wtemp3;
		}

		if (IsFGR64) {
		unsigned Wtemp2 = RegInfo.createVirtualRegister(&Mips::MSA128WRegClass);
		BuildMI(*BB, MI, DL, TII->get(Mips::FEXDO_W), Wtemp2)
		.addReg(WPHI)
		.addReg(WPHI);
		WPHI = Wtemp2;
		}

		BuildMI(*BB, MI, DL, TII->get(Mips::FEXDO_H), Wd).addReg(WPHI).addReg(WPHI);

		MI.eraseFromParent();
		return BB;
		}

		// Emit the FPEXTEND_PSEUDO instruction.
		//
		// Expand an f16 to either a FGR32Opnd or FGR64Opnd.
		//
		// Safety: Cycle the result through the GPRs so the result always ends up
		// the correct floating point register.
		//
		// FIXME: This copying is strictly unnecessary. If we could tie FGR32Opnd:$Fd
		// / FGR64Opnd:$Fd and MSA128F16:$Ws to the same physical register
		// (which they can be, as the MSA registers are defined to alias the
		// FPU's 64 bit and 32 bit registers) the result can be accessed using
		// the correct register class. That requires operands be tie-able across
		// register classes which have a sub/super register class relationship. I
		// haven't checked.
		//
		// For FGR32Opnd:
		//
		// FPEXTEND FGR32Opnd:$fd, MSA128F16:$ws
		// =>
		// fexupr.w $wtemp, $ws
		// copy_s.w $rtemp, $ws[0]
		// mtc1 $rtemp, $fd
		//
		// For FGR64Opnd on Mips64:
		//
		// FPEXTEND FGR64Opnd:$fd, MSA128F16:$ws
		// =>
		// fexupr.w $wtemp, $ws
		// fexupr.d $wtemp2, $wtemp
		// copy_s.d $rtemp, $wtemp2s[0]
		// dmtc1 $rtemp, $fd
		//
		// For FGR64Opnd on Mips32:
		//
		// FPEXTEND FGR64Opnd:$fd, MSA128F16:$ws
		// =>
		// fexupr.w $wtemp, $ws
		// fexupr.d $wtemp2, $wtemp
		// copy_s.w $rtemp, $wtemp2[0]
		// mtc1 $rtemp, $ftemp
		// copy_s.w $rtemp2, $wtemp2[1]
		// $fd = mthc1 $rtemp2, $ftemp
		//
		MachineBasicBlock *
		MipsSETargetLowering::emitFPEXTEND_PSEUDO(MachineInstr &MI,
		MachineBasicBlock *BB,
		bool IsFGR64) const {

		// Strictly speaking, we need MIPS32R5 to support MSA. We'll be generous
		// here. It's technically doable to support MIPS32 here, but the ISA forbids
		// it.
		assert(Subtarget.hasMSA() && Subtarget.hasMips32r2());

		bool IsFGR64onMips64 = Subtarget.hasMips64() && IsFGR64;
		bool IsFGR64onMips32 = !Subtarget.hasMips64() && IsFGR64;

		const TargetInstrInfo *TII = Subtarget.getInstrInfo();
		DebugLoc DL = MI.getDebugLoc();
		unsigned Fd = MI.getOperand(0).getReg();
		unsigned Ws = MI.getOperand(1).getReg();

		MachineRegisterInfo &RegInfo = BB->getParent()->getRegInfo();
		const TargetRegisterClass *GPRRC =
		IsFGR64onMips64 ? &Mips::GPR64RegClass : &Mips::GPR32RegClass;
		unsigned MTC1Opc = IsFGR64onMips64 ? Mips::DMTC1 : Mips::MTC1;
		unsigned COPYOpc = IsFGR64onMips64 ? Mips::COPY_S_D : Mips::COPY_S_W;

		unsigned Wtemp = RegInfo.createVirtualRegister(&Mips::MSA128WRegClass);
		unsigned WPHI = Wtemp;

		BuildMI(*BB, MI, DL, TII->get(Mips::FEXUPR_W), Wtemp).addReg(Ws);
		if (IsFGR64) {
		WPHI = RegInfo.createVirtualRegister(&Mips::MSA128DRegClass);
		BuildMI(*BB, MI, DL, TII->get(Mips::FEXUPR_D), WPHI).addReg(Wtemp);
		}

		// Perform the safety regclass copy mentioned above.
		unsigned Rtemp = RegInfo.createVirtualRegister(GPRRC);
		unsigned FPRPHI = IsFGR64onMips32
		? RegInfo.createVirtualRegister(&Mips::FGR64RegClass)
		: Fd;
		BuildMI(*BB, MI, DL, TII->get(COPYOpc), Rtemp).addReg(WPHI).addImm(0);
		BuildMI(*BB, MI, DL, TII->get(MTC1Opc), FPRPHI).addReg(Rtemp);

		if (IsFGR64onMips32) {
		unsigned Rtemp2 = RegInfo.createVirtualRegister(GPRRC);
		BuildMI(*BB, MI, DL, TII->get(Mips::COPY_S_W), Rtemp2)
		.addReg(WPHI)
		.addImm(1);
		BuildMI(*BB, MI, DL, TII->get(Mips::MTHC1_D64), Fd)
		.addReg(FPRPHI)
		.addReg(Rtemp2);
		}

		MI.eraseFromParent();
		return BB;
		}

// Emit the FEXP2_W_1 pseudo instructions.		// Emit the FEXP2_W_1 pseudo instructions.
//		//
// fexp2_w_1_pseudo $wd, $wt		// fexp2_w_1_pseudo $wd, $wt
// =>		// =>
// ldi.w $ws, 1		// ldi.w $ws, 1
// fexp2.w $wd, $ws, $wt		// fexp2.w $wd, $ws, $wt
MachineBasicBlock *		MachineBasicBlock *
MipsSETargetLowering::emitFEXP2_W_1(MachineInstr &MI,		MipsSETargetLowering::emitFEXP2_W_1(MachineInstr &MI,
▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines