This is an archive of the discontinued LLVM Phabricator instance.

[X86] Remove X86ISD::FILD_FLAG and stop gluing nodes together.
ClosedPublic

Authored by craig.topper on Jan 15 2020, 1:17 PM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel

Commits

rG5fa2022ec005: [X86] Remove X86ISD::FILD_FLAG and stop gluing nodes together.

Summary

I think whatever problem the gluing was fixing has long since been fixed. We don't have any of the restrictions on FP stack stuff that existed back when this was first added.

I had to change which type we use for FILD in BuildFILD when X86 was enabled because most of the isel patterns block f32/f64 instructions when SSE1/SSE2 are enabled. So I needed to use the f80 pattern, but this shouldn't have an effect the generated code since there is only one FILD instruction anyway. We already use f80 explicitly in other other places.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

craig.topper created this revision.Jan 15 2020, 1:17 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 15 2020, 1:17 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

RKSimon added inline comments.Jan 17 2020, 5:41 AM

llvm/test/CodeGen/X86/vec-strict-inttofp-256.ll
659	Its annoying that the update scripts hides the pointer math for stack accesses - it'd be good to check what aliasing is occurring. But it should be OK.

Rebase. Regenerate with a hacked script to show the real stack offsets for reviewing. I'll regenerate before commiting so the next person who runs the script won't get a surprise.

Harbormaster completed remote builds in B44339: Diff 238945.Jan 18 2020, 12:54 AM

craig.topper marked an inline comment as done.Jan 18 2020, 1:01 AM

craig.topper added inline comments.

llvm/test/CodeGen/X86/vec-strict-inttofp-256.ll
636	I wonder if it would make sense to parallelize this. I think we can shift the v4i64 right by 32, trunc to v4i32 use sitofp to convert that part to double. Multiply that by 2^32 in double. That should all be lossless. Then for the bottom 32 bits we can mask with 0xffffffff. OR with the double representation for 2^52. Then subtract 2^52 from it. This should also be lossless. Then we just add the two double vectors together which should be the only part that does any rounding.

Cheers Craig, TBH I'm left wondering whether we should tweak the update script to always keep stack arithmetic for x86 - do you think they'd be too much churn?

Anyway, LGTM as a first step, but it highlights a number of topics for further possible work (nothing new there......).

llvm/test/CodeGen/X86/vec-strict-inttofp-256.ll
636	@scanon or @andrew.w.kaylor should be able to confirm but that sounds alright to me
llvm/test/CodeGen/X86/vec-strict-inttofp-512.ll
425–426	Its a pity this doesn't spill to stack in order, which would've allowed us to load ymm0 with a single vmovups - lack of scalar/vector stack aliasing awareness has come up in some other bugs IIRC

This revision is now accepted and ready to land.Jan 18 2020, 6:56 AM

In D72805#1828063, @RKSimon wrote:

Cheers Craig, TBH I'm left wondering whether we should tweak the update script to always keep stack arithmetic for x86 - do you think they'd be too much churn?

Would probably cause a lot of churn in the near term, but we regularly regenerate tests before patches so maybe not a big deal? Another option might be to just add an option to disable like the x86_scrub_rip/no_x86_scrub_rip option?

Anyway, LGTM as a first step, but it highlights a number of topics for further possible work (nothing new there......).

Closed by commit rG5fa2022ec005: [X86] Remove X86ISD::FILD_FLAG and stop gluing nodes together. (authored by craig.topper). · Explain WhyJan 18 2020, 9:48 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86ISelLowering.h

5 lines

X86ISelLowering.cpp

20 lines

X86InstrFPStack.td

17 lines

test/

CodeGen/

X86/

vec-strict-inttofp-256.ll

5 lines

vec-strict-inttofp-512.ll

26 lines

Diff 238975

llvm/lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 653 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
/// to the X86::FIST*m instructions and the rounding mode change stuff. It		/// to the X86::FIST*m instructions and the rounding mode change stuff. It
/// has two inputs (token chain and address) and two outputs (int value		/// has two inputs (token chain and address) and two outputs (int value
/// and token chain). Memory VT specifies the type to store to.		/// and token chain). Memory VT specifies the type to store to.
FP_TO_INT_IN_MEM,		FP_TO_INT_IN_MEM,

/// This instruction implements SINT_TO_FP with the		/// This instruction implements SINT_TO_FP with the
/// integer source in memory and FP reg result. This corresponds to the		/// integer source in memory and FP reg result. This corresponds to the
/// X86::FILD*m instructions. It has two inputs (token chain and address)		/// X86::FILD*m instructions. It has two inputs (token chain and address)
/// and two outputs (FP value and token chain). FILD_FLAG also produces a		/// and two outputs (FP value and token chain). The integer source type is
/// flag). The integer source type is specified by the memory VT.		/// specified by the memory VT.
FILD,		FILD,
FILD_FLAG,

/// This instruction implements a fp->int store from FP stack		/// This instruction implements a fp->int store from FP stack
/// slots. This corresponds to the fist instruction. It takes a		/// slots. This corresponds to the fist instruction. It takes a
/// chain operand, value to store, address, and glue. The memory VT		/// chain operand, value to store, address, and glue. The memory VT
/// specifies the type to store as.		/// specifies the type to store as.
FIST,		FIST,

/// This instruction implements an extending load to FP stack slots.		/// This instruction implements an extending load to FP stack slots.
▲ Show 20 Lines • Show All 1,061 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 18,897 Lines • ▼ Show 20 Lines
std::pair<SDValue, SDValue> X86TargetLowering::BuildFILD(SDValue Op, EVT SrcVT, SDValue Chain,		std::pair<SDValue, SDValue> X86TargetLowering::BuildFILD(SDValue Op, EVT SrcVT, SDValue Chain,
SDValue StackSlot,		SDValue StackSlot,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
// Build the FILD		// Build the FILD
SDLoc DL(Op);		SDLoc DL(Op);
SDVTList Tys;		SDVTList Tys;
bool useSSE = isScalarFPTypeInSSEReg(Op.getValueType());		bool useSSE = isScalarFPTypeInSSEReg(Op.getValueType());
if (useSSE)		if (useSSE)
Tys = DAG.getVTList(MVT::f64, MVT::Other, MVT::Glue);		Tys = DAG.getVTList(MVT::f80, MVT::Other);
else		else
Tys = DAG.getVTList(Op.getValueType(), MVT::Other);		Tys = DAG.getVTList(Op.getValueType(), MVT::Other);

unsigned ByteSize = SrcVT.getSizeInBits() / 8;		unsigned ByteSize = SrcVT.getSizeInBits() / 8;

FrameIndexSDNode *FI = dyn_cast<FrameIndexSDNode>(StackSlot);		FrameIndexSDNode *FI = dyn_cast<FrameIndexSDNode>(StackSlot);
MachineMemOperand *LoadMMO;		MachineMemOperand *LoadMMO;
if (FI) {		if (FI) {
int SSFI = FI->getIndex();		int SSFI = FI->getIndex();
LoadMMO = DAG.getMachineFunction().getMachineMemOperand(		LoadMMO = DAG.getMachineFunction().getMachineMemOperand(
MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), SSFI),		MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), SSFI),
MachineMemOperand::MOLoad, ByteSize, ByteSize);		MachineMemOperand::MOLoad, ByteSize, ByteSize);
} else {		} else {
LoadMMO = cast<LoadSDNode>(StackSlot)->getMemOperand();		LoadMMO = cast<LoadSDNode>(StackSlot)->getMemOperand();
StackSlot = StackSlot.getOperand(1);		StackSlot = StackSlot.getOperand(1);
}		}
SDValue FILDOps[] = {Chain, StackSlot};		SDValue FILDOps[] = {Chain, StackSlot};
SDValue Result =		SDValue Result =
DAG.getMemIntrinsicNode(useSSE ? X86ISD::FILD_FLAG : X86ISD::FILD, DL,		DAG.getMemIntrinsicNode(X86ISD::FILD, DL,
Tys, FILDOps, SrcVT, LoadMMO);		Tys, FILDOps, SrcVT, LoadMMO);
Chain = Result.getValue(1);		Chain = Result.getValue(1);

if (useSSE) {		if (useSSE) {
SDValue InFlag = Result.getValue(2);

// FIXME: Currently the FST is glued to the FILD_FLAG. This
// shouldn't be necessary except that RFP cannot be live across
// multiple blocks. When stackifier is fixed, they can be uncoupled.
MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
unsigned SSFISize = Op.getValueSizeInBits() / 8;		unsigned SSFISize = Op.getValueSizeInBits() / 8;
int SSFI = MF.getFrameInfo().CreateStackObject(SSFISize, SSFISize, false);		int SSFI = MF.getFrameInfo().CreateStackObject(SSFISize, SSFISize, false);
auto PtrVT = getPointerTy(MF.getDataLayout());		auto PtrVT = getPointerTy(MF.getDataLayout());
SDValue StackSlot = DAG.getFrameIndex(SSFI, PtrVT);		SDValue StackSlot = DAG.getFrameIndex(SSFI, PtrVT);
Tys = DAG.getVTList(MVT::Other);		Tys = DAG.getVTList(MVT::Other);
SDValue FSTOps[] = {Chain, Result, StackSlot, InFlag};		SDValue FSTOps[] = {Chain, Result, StackSlot};
MachineMemOperand *StoreMMO = DAG.getMachineFunction().getMachineMemOperand(		MachineMemOperand *StoreMMO = DAG.getMachineFunction().getMachineMemOperand(
MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), SSFI),		MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), SSFI),
MachineMemOperand::MOStore, SSFISize, SSFISize);		MachineMemOperand::MOStore, SSFISize, SSFISize);

Chain = DAG.getMemIntrinsicNode(X86ISD::FST, DL, Tys, FSTOps,		Chain = DAG.getMemIntrinsicNode(X86ISD::FST, DL, Tys, FSTOps,
Op.getValueType(), StoreMMO);		Op.getValueType(), StoreMMO);
Result = DAG.getLoad(		Result = DAG.getLoad(
Op.getValueType(), DL, Chain, StackSlot,		Op.getValueType(), DL, Chain, StackSlot,
▲ Show 20 Lines • Show All 10,493 Lines • ▼ Show 20 Lines	if (!Subtarget.useSoftFloat() && !NoImplicitFloatOps) {
DAG.getIntPtrConstant(0, dl));		DAG.getIntPtrConstant(0, dl));
Results.push_back(Res);		Results.push_back(Res);
Results.push_back(Ld.getValue(1));		Results.push_back(Ld.getValue(1));
return;		return;
}		}
if (Subtarget.hasX87()) {		if (Subtarget.hasX87()) {
// First load this into an 80-bit X87 register. This will put the whole		// First load this into an 80-bit X87 register. This will put the whole
// integer into the significand.		// integer into the significand.
// FIXME: Do we need to glue? See FIXME comment in BuildFILD.		SDVTList Tys = DAG.getVTList(MVT::f80, MVT::Other);
SDVTList Tys = DAG.getVTList(MVT::f80, MVT::Other, MVT::Glue);
SDValue Ops[] = { Node->getChain(), Node->getBasePtr() };		SDValue Ops[] = { Node->getChain(), Node->getBasePtr() };
SDValue Result = DAG.getMemIntrinsicNode(X86ISD::FILD_FLAG,		SDValue Result = DAG.getMemIntrinsicNode(X86ISD::FILD,
dl, Tys, Ops, MVT::i64,		dl, Tys, Ops, MVT::i64,
Node->getMemOperand());		Node->getMemOperand());
SDValue Chain = Result.getValue(1);		SDValue Chain = Result.getValue(1);
SDValue InFlag = Result.getValue(2);

// Now store the X87 register to a stack temporary and convert to i64.		// Now store the X87 register to a stack temporary and convert to i64.
// This store is not atomic and doesn't need to be.		// This store is not atomic and doesn't need to be.
// FIXME: We don't need a stack temporary if the result of the load		// FIXME: We don't need a stack temporary if the result of the load
// is already being stored. We could just directly store there.		// is already being stored. We could just directly store there.
SDValue StackPtr = DAG.CreateStackTemporary(MVT::i64);		SDValue StackPtr = DAG.CreateStackTemporary(MVT::i64);
int SPFI = cast<FrameIndexSDNode>(StackPtr.getNode())->getIndex();		int SPFI = cast<FrameIndexSDNode>(StackPtr.getNode())->getIndex();
MachinePointerInfo MPI =		MachinePointerInfo MPI =
MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), SPFI);		MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), SPFI);
SDValue StoreOps[] = { Chain, Result, StackPtr, InFlag };		SDValue StoreOps[] = { Chain, Result, StackPtr };
Chain = DAG.getMemIntrinsicNode(X86ISD::FIST, dl,		Chain = DAG.getMemIntrinsicNode(X86ISD::FIST, dl,
DAG.getVTList(MVT::Other), StoreOps,		DAG.getVTList(MVT::Other), StoreOps,
MVT::i64, MPI, 0 /Align/,		MVT::i64, MPI, 0 /Align/,
MachineMemOperand::MOStore);		MachineMemOperand::MOStore);

// Finally load the value back from the stack temporary and return it.		// Finally load the value back from the stack temporary and return it.
// This load is not atomic and doesn't need to be.		// This load is not atomic and doesn't need to be.
// This load will be further type legalized.		// This load will be further type legalized.
▲ Show 20 Lines • Show All 165 Lines • ▼ Show 20 Lines	const char *X86TargetLowering::getTargetNodeName(unsigned Opcode) const {
case X86ISD::BSR: return "X86ISD::BSR";		case X86ISD::BSR: return "X86ISD::BSR";
case X86ISD::SHLD: return "X86ISD::SHLD";		case X86ISD::SHLD: return "X86ISD::SHLD";
case X86ISD::SHRD: return "X86ISD::SHRD";		case X86ISD::SHRD: return "X86ISD::SHRD";
case X86ISD::FAND: return "X86ISD::FAND";		case X86ISD::FAND: return "X86ISD::FAND";
case X86ISD::FANDN: return "X86ISD::FANDN";		case X86ISD::FANDN: return "X86ISD::FANDN";
case X86ISD::FOR: return "X86ISD::FOR";		case X86ISD::FOR: return "X86ISD::FOR";
case X86ISD::FXOR: return "X86ISD::FXOR";		case X86ISD::FXOR: return "X86ISD::FXOR";
case X86ISD::FILD: return "X86ISD::FILD";		case X86ISD::FILD: return "X86ISD::FILD";
case X86ISD::FILD_FLAG: return "X86ISD::FILD_FLAG";
case X86ISD::FIST: return "X86ISD::FIST";		case X86ISD::FIST: return "X86ISD::FIST";
case X86ISD::FP_TO_INT_IN_MEM: return "X86ISD::FP_TO_INT_IN_MEM";		case X86ISD::FP_TO_INT_IN_MEM: return "X86ISD::FP_TO_INT_IN_MEM";
case X86ISD::FLD: return "X86ISD::FLD";		case X86ISD::FLD: return "X86ISD::FLD";
case X86ISD::FST: return "X86ISD::FST";		case X86ISD::FST: return "X86ISD::FST";
case X86ISD::CALL: return "X86ISD::CALL";		case X86ISD::CALL: return "X86ISD::CALL";
case X86ISD::BT: return "X86ISD::BT";		case X86ISD::BT: return "X86ISD::BT";
case X86ISD::CMP: return "X86ISD::CMP";		case X86ISD::CMP: return "X86ISD::CMP";
case X86ISD::STRICT_FCMP: return "X86ISD::STRICT_FCMP";		case X86ISD::STRICT_FCMP: return "X86ISD::STRICT_FCMP";
▲ Show 20 Lines • Show All 17,691 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86InstrFPStack.td

	Show All 23 Lines
	def SDTX86Fist : SDTypeProfile<0, 2, [SDTCisFP<0>, SDTCisPtrTy<1>]>;			def SDTX86Fist : SDTypeProfile<0, 2, [SDTCisFP<0>, SDTCisPtrTy<1>]>;
	def SDTX86Fnstsw : SDTypeProfile<1, 1, [SDTCisVT<0, i16>, SDTCisVT<1, i16>]>;			def SDTX86Fnstsw : SDTypeProfile<1, 1, [SDTCisVT<0, i16>, SDTCisVT<1, i16>]>;

	def SDTX86CwdStore : SDTypeProfile<0, 1, [SDTCisPtrTy<0>]>;			def SDTX86CwdStore : SDTypeProfile<0, 1, [SDTCisPtrTy<0>]>;

	def X86fld : SDNode<"X86ISD::FLD", SDTX86Fld,			def X86fld : SDNode<"X86ISD::FLD", SDTX86Fld,
	[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;			[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
	def X86fst : SDNode<"X86ISD::FST", SDTX86Fst,			def X86fst : SDNode<"X86ISD::FST", SDTX86Fst,
	[SDNPHasChain, SDNPOptInGlue, SDNPMayStore,			[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
	SDNPMemOperand]>;
	def X86fild : SDNode<"X86ISD::FILD", SDTX86Fild,			def X86fild : SDNode<"X86ISD::FILD", SDTX86Fild,
	[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;			[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
	def X86fildflag : SDNode<"X86ISD::FILD_FLAG", SDTX86Fild,
	[SDNPHasChain, SDNPOutGlue, SDNPMayLoad,
	SDNPMemOperand]>;
	def X86fist : SDNode<"X86ISD::FIST", SDTX86Fist,			def X86fist : SDNode<"X86ISD::FIST", SDTX86Fist,
	[SDNPHasChain, SDNPOptInGlue, SDNPMayStore,			[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
	SDNPMemOperand]>;
	def X86fp_stsw : SDNode<"X86ISD::FNSTSW16r", SDTX86Fnstsw>;			def X86fp_stsw : SDNode<"X86ISD::FNSTSW16r", SDTX86Fnstsw>;
	def X86fp_to_mem : SDNode<"X86ISD::FP_TO_INT_IN_MEM", SDTX86Fst,			def X86fp_to_mem : SDNode<"X86ISD::FP_TO_INT_IN_MEM", SDTX86Fst,
	[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;			[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
	def X86fp_cwd_get16 : SDNode<"X86ISD::FNSTCW16m", SDTX86CwdStore,			def X86fp_cwd_get16 : SDNode<"X86ISD::FNSTCW16m", SDTX86CwdStore,
	[SDNPHasChain, SDNPMayStore, SDNPSideEffect,			[SDNPHasChain, SDNPMayStore, SDNPSideEffect,
	SDNPMemOperand]>;			SDNPMemOperand]>;

	def X86fstf32 : PatFrag<(ops node:$val, node:$ptr),			def X86fstf32 : PatFrag<(ops node:$val, node:$ptr),
	Show All 24 Lines
	}]>;			}]>;
	def X86fild32 : PatFrag<(ops node:$ptr), (X86fild node:$ptr), [{			def X86fild32 : PatFrag<(ops node:$ptr), (X86fild node:$ptr), [{
	return cast<MemIntrinsicSDNode>(N)->getMemoryVT() == MVT::i32;			return cast<MemIntrinsicSDNode>(N)->getMemoryVT() == MVT::i32;
	}]>;			}]>;
	def X86fild64 : PatFrag<(ops node:$ptr), (X86fild node:$ptr), [{			def X86fild64 : PatFrag<(ops node:$ptr), (X86fild node:$ptr), [{
	return cast<MemIntrinsicSDNode>(N)->getMemoryVT() == MVT::i64;			return cast<MemIntrinsicSDNode>(N)->getMemoryVT() == MVT::i64;
	}]>;			}]>;

	def X86fildflag64 : PatFrag<(ops node:$ptr), (X86fildflag node:$ptr), [{
	return cast<MemIntrinsicSDNode>(N)->getMemoryVT() == MVT::i64;
	}]>;

	def X86fist64 : PatFrag<(ops node:$val, node:$ptr),			def X86fist64 : PatFrag<(ops node:$val, node:$ptr),
	(X86fist node:$val, node:$ptr), [{			(X86fist node:$val, node:$ptr), [{
	return cast<MemIntrinsicSDNode>(N)->getMemoryVT() == MVT::i64;			return cast<MemIntrinsicSDNode>(N)->getMemoryVT() == MVT::i64;
	}]>;			}]>;

	def X86fp_to_i16mem : PatFrag<(ops node:$val, node:$ptr),			def X86fp_to_i16mem : PatFrag<(ops node:$val, node:$ptr),
	(X86fp_to_mem node:$val, node:$ptr), [{			(X86fp_to_mem node:$val, node:$ptr), [{
	return cast<MemIntrinsicSDNode>(N)->getMemoryVT() == MVT::i16;			return cast<MemIntrinsicSDNode>(N)->getMemoryVT() == MVT::i16;
	▲ Show 20 Lines • Show All 701 Lines • ▼ Show 20 Lines
	// Floating point constant -0.0 and -1.0			// Floating point constant -0.0 and -1.0
	def : Pat<(f32 fpimmneg0), (CHS_Fp32 (LD_Fp032))>, Requires<[FPStackf32]>;			def : Pat<(f32 fpimmneg0), (CHS_Fp32 (LD_Fp032))>, Requires<[FPStackf32]>;
	def : Pat<(f32 fpimmneg1), (CHS_Fp32 (LD_Fp132))>, Requires<[FPStackf32]>;			def : Pat<(f32 fpimmneg1), (CHS_Fp32 (LD_Fp132))>, Requires<[FPStackf32]>;
	def : Pat<(f64 fpimmneg0), (CHS_Fp64 (LD_Fp064))>, Requires<[FPStackf64]>;			def : Pat<(f64 fpimmneg0), (CHS_Fp64 (LD_Fp064))>, Requires<[FPStackf64]>;
	def : Pat<(f64 fpimmneg1), (CHS_Fp64 (LD_Fp164))>, Requires<[FPStackf64]>;			def : Pat<(f64 fpimmneg1), (CHS_Fp64 (LD_Fp164))>, Requires<[FPStackf64]>;
	def : Pat<(f80 fpimmneg0), (CHS_Fp80 (LD_Fp080))>;			def : Pat<(f80 fpimmneg0), (CHS_Fp80 (LD_Fp080))>;
	def : Pat<(f80 fpimmneg1), (CHS_Fp80 (LD_Fp180))>;			def : Pat<(f80 fpimmneg1), (CHS_Fp80 (LD_Fp180))>;

	// Used to conv. i64 to f64 since there isn't a SSE version.
	def : Pat<(X86fildflag64 addr:$src), (ILD_Fp64m64 addr:$src)>;

	// Used to conv. between f80 and i64 for i64 atomic loads.			// Used to conv. between f80 and i64 for i64 atomic loads.
	def : Pat<(X86fildflag64 addr:$src), (ILD_Fp64m80 addr:$src)>;
	def : Pat<(X86fist64 RFP80:$src, addr:$op), (IST_Fp64m80 addr:$op, RFP80:$src)>;			def : Pat<(X86fist64 RFP80:$src, addr:$op), (IST_Fp64m80 addr:$op, RFP80:$src)>;

	// FP extensions map onto simple pseudo-value conversions if they are to/from			// FP extensions map onto simple pseudo-value conversions if they are to/from
	// the FP stack.			// the FP stack.
	def : Pat<(f64 (any_fpextend RFP32:$src)), (COPY_TO_REGCLASS RFP32:$src, RFP64)>,			def : Pat<(f64 (any_fpextend RFP32:$src)), (COPY_TO_REGCLASS RFP32:$src, RFP64)>,
	Requires<[FPStackf32]>;			Requires<[FPStackf32]>;
	def : Pat<(f80 (any_fpextend RFP32:$src)), (COPY_TO_REGCLASS RFP32:$src, RFP80)>,			def : Pat<(f80 (any_fpextend RFP32:$src)), (COPY_TO_REGCLASS RFP32:$src, RFP80)>,
	Requires<[FPStackf32]>;			Requires<[FPStackf32]>;
	Show All 12 Lines

llvm/test/CodeGen/X86/vec-strict-inttofp-256.ll

Show First 20 Lines • Show All 627 Lines • ▼ Show 20 Lines	%result = call <4 x double> @llvm.experimental.constrained.uitofp.v4f64.v4i32(<4 x i32> %x,
metadata !"round.dynamic",		metadata !"round.dynamic",
metadata !"fpexcept.strict") #0		metadata !"fpexcept.strict") #0
ret <4 x double> %result		ret <4 x double> %result
}		}

define <4 x double> @sitofp_v4i64_v4f64(<4 x i64> %x) #0 {		define <4 x double> @sitofp_v4i64_v4f64(<4 x i64> %x) #0 {
; AVX-32-LABEL: sitofp_v4i64_v4f64:		; AVX-32-LABEL: sitofp_v4i64_v4f64:
; AVX-32: # %bb.0:		; AVX-32: # %bb.0:
; AVX-32-NEXT: pushl %ebp		; AVX-32-NEXT: pushl %ebp
		craig.topperAuthorUnsubmitted Done Reply Inline Actions I wonder if it would make sense to parallelize this. I think we can shift the v4i64 right by 32, trunc to v4i32 use sitofp to convert that part to double. Multiply that by 2^32 in double. That should all be lossless. Then for the bottom 32 bits we can mask with 0xffffffff. OR with the double representation for 2^52. Then subtract 2^52 from it. This should also be lossless. Then we just add the two double vectors together which should be the only part that does any rounding. craig.topper: I wonder if it would make sense to parallelize this. I think we can shift the v4i64 right by 32…
		RKSimonUnsubmitted Not Done Reply Inline Actions @scanon or @andrew.w.kaylor should be able to confirm but that sounds alright to me RKSimon: @scanon or @andrew.w.kaylor should be able to confirm but that sounds alright to me
; AVX-32-NEXT: .cfi_def_cfa_offset 8		; AVX-32-NEXT: .cfi_def_cfa_offset 8
; AVX-32-NEXT: .cfi_offset %ebp, -8		; AVX-32-NEXT: .cfi_offset %ebp, -8
; AVX-32-NEXT: movl %esp, %ebp		; AVX-32-NEXT: movl %esp, %ebp
; AVX-32-NEXT: .cfi_def_cfa_register %ebp		; AVX-32-NEXT: .cfi_def_cfa_register %ebp
; AVX-32-NEXT: andl $-8, %esp		; AVX-32-NEXT: andl $-8, %esp
; AVX-32-NEXT: subl $64, %esp		; AVX-32-NEXT: subl $64, %esp
; AVX-32-NEXT: vmovlps %xmm0, {{[0-9]+}}(%esp)		; AVX-32-NEXT: vmovlps %xmm0, {{[0-9]+}}(%esp)
; AVX-32-NEXT: vpermilps {{.*#+}} xmm1 = xmm0[2,3,0,1]		; AVX-32-NEXT: vpermilps {{.*#+}} xmm1 = xmm0[2,3,0,1]
; AVX-32-NEXT: vmovlps %xmm1, {{[0-9]+}}(%esp)		; AVX-32-NEXT: vmovlps %xmm1, {{[0-9]+}}(%esp)
; AVX-32-NEXT: vextractf128 $1, %ymm0, %xmm0		; AVX-32-NEXT: vextractf128 $1, %ymm0, %xmm0
; AVX-32-NEXT: vmovlps %xmm0, {{[0-9]+}}(%esp)		; AVX-32-NEXT: vmovlps %xmm0, {{[0-9]+}}(%esp)
; AVX-32-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[2,3,0,1]		; AVX-32-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[2,3,0,1]
; AVX-32-NEXT: vmovlps %xmm0, {{[0-9]+}}(%esp)		; AVX-32-NEXT: vmovlps %xmm0, {{[0-9]+}}(%esp)
; AVX-32-NEXT: fildll {{[0-9]+}}(%esp)		; AVX-32-NEXT: fildll {{[0-9]+}}(%esp)
; AVX-32-NEXT: fstpl {{[0-9]+}}(%esp)		; AVX-32-NEXT: fstpl {{[0-9]+}}(%esp)
; AVX-32-NEXT: fildll {{[0-9]+}}(%esp)		; AVX-32-NEXT: fildll {{[0-9]+}}(%esp)
; AVX-32-NEXT: fstpl {{[0-9]+}}(%esp)		; AVX-32-NEXT: fstpl {{[0-9]+}}(%esp)
; AVX-32-NEXT: wait
; AVX-32-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
; AVX-32-NEXT: vmovhps {{.*#+}} xmm0 = xmm0[0,1],mem[0,1]
; AVX-32-NEXT: fildll {{[0-9]+}}(%esp)		; AVX-32-NEXT: fildll {{[0-9]+}}(%esp)
; AVX-32-NEXT: fstpl {{[0-9]+}}(%esp)		; AVX-32-NEXT: fstpl {{[0-9]+}}(%esp)
; AVX-32-NEXT: fildll {{[0-9]+}}(%esp)		; AVX-32-NEXT: fildll {{[0-9]+}}(%esp)
; AVX-32-NEXT: fstpl (%esp)		; AVX-32-NEXT: fstpl (%esp)
; AVX-32-NEXT: wait		; AVX-32-NEXT: wait
		; AVX-32-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
		RKSimonUnsubmitted Not Done Reply Inline Actions Its annoying that the update scripts hides the pointer math for stack accesses - it'd be good to check what aliasing is occurring. But it should be OK. RKSimon: Its annoying that the update scripts hides the pointer math for stack accesses - it'd be good…
		; AVX-32-NEXT: vmovhps {{.*#+}} xmm0 = xmm0[0,1],mem[0,1]
; AVX-32-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero		; AVX-32-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
; AVX-32-NEXT: vmovhps {{.*#+}} xmm1 = xmm1[0,1],mem[0,1]		; AVX-32-NEXT: vmovhps {{.*#+}} xmm1 = xmm1[0,1],mem[0,1]
; AVX-32-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0		; AVX-32-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; AVX-32-NEXT: movl %ebp, %esp		; AVX-32-NEXT: movl %ebp, %esp
; AVX-32-NEXT: popl %ebp		; AVX-32-NEXT: popl %ebp
; AVX-32-NEXT: .cfi_def_cfa %esp, 4		; AVX-32-NEXT: .cfi_def_cfa %esp, 4
; AVX-32-NEXT: retl		; AVX-32-NEXT: retl
;		;
▲ Show 20 Lines • Show All 485 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vec-strict-inttofp-512.ll

	Show First 20 Lines • Show All 284 Lines • ▼ Show 20 Lines
	; NODQ-32-NEXT: vextractf128 $1, %ymm0, %xmm0			; NODQ-32-NEXT: vextractf128 $1, %ymm0, %xmm0
	; NODQ-32-NEXT: vmovlps %xmm0, {{[0-9]+}}(%esp)			; NODQ-32-NEXT: vmovlps %xmm0, {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[2,3,0,1]			; NODQ-32-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[2,3,0,1]
	; NODQ-32-NEXT: vmovlps %xmm0, {{[0-9]+}}(%esp)			; NODQ-32-NEXT: vmovlps %xmm0, {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fstpl {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fstpl {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fstpl {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fstpl {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: wait
	; NODQ-32-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
	; NODQ-32-NEXT: vmovhps {{.*#+}} xmm0 = xmm0[0,1],mem[0,1]
	; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fstpl {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fstpl {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fstpl (%esp)			; NODQ-32-NEXT: fstpl (%esp)
	; NODQ-32-NEXT: wait
	; NODQ-32-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
	; NODQ-32-NEXT: vmovhps {{.*#+}} xmm1 = xmm1[0,1],mem[0,1]
	; NODQ-32-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
	; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fstpl {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fstpl {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fstpl {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fstpl {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: wait
	; NODQ-32-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
	; NODQ-32-NEXT: vmovhps {{.*#+}} xmm1 = xmm1[0,1],mem[0,1]
	; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fstpl {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fstpl {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fstpl {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fstpl {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: wait			; NODQ-32-NEXT: wait
				; NODQ-32-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
				; NODQ-32-NEXT: vmovhps {{.*#+}} xmm0 = xmm0[0,1],mem[0,1]
				; NODQ-32-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
				; NODQ-32-NEXT: vmovhps {{.*#+}} xmm1 = xmm1[0,1],mem[0,1]
				; NODQ-32-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
				; NODQ-32-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
				; NODQ-32-NEXT: vmovhps {{.*#+}} xmm1 = xmm1[0,1],mem[0,1]
	; NODQ-32-NEXT: vmovsd {{.*#+}} xmm2 = mem[0],zero			; NODQ-32-NEXT: vmovsd {{.*#+}} xmm2 = mem[0],zero
	; NODQ-32-NEXT: vmovhps {{.*#+}} xmm2 = xmm2[0,1],mem[0,1]			; NODQ-32-NEXT: vmovhps {{.*#+}} xmm2 = xmm2[0,1],mem[0,1]
	; NODQ-32-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1			; NODQ-32-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1
	; NODQ-32-NEXT: vinsertf64x4 $1, %ymm0, %zmm1, %zmm0			; NODQ-32-NEXT: vinsertf64x4 $1, %ymm0, %zmm1, %zmm0
	; NODQ-32-NEXT: movl %ebp, %esp			; NODQ-32-NEXT: movl %ebp, %esp
	; NODQ-32-NEXT: popl %ebp			; NODQ-32-NEXT: popl %ebp
	; NODQ-32-NEXT: .cfi_def_cfa %esp, 4			; NODQ-32-NEXT: .cfi_def_cfa %esp, 4
	; NODQ-32-NEXT: retl			; NODQ-32-NEXT: retl
	▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines
	; NODQ-32-NEXT: vmovlps %xmm0, {{[0-9]+}}(%esp)			; NODQ-32-NEXT: vmovlps %xmm0, {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fstps {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fstps {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fstps {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fstps {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fstps {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fstps {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fstps {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fstps {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: wait
	; NODQ-32-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; NODQ-32-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[2,3]
	; NODQ-32-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1],mem[0],xmm0[3]
	; NODQ-32-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1,2],mem[0]
	; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)
				RKSimonUnsubmitted Not Done Reply Inline Actions Its a pity this doesn't spill to stack in order, which would've allowed us to load ymm0 with a single vmovups - lack of scalar/vector stack aliasing awareness has come up in some other bugs IIRC RKSimon: Its a pity this doesn't spill to stack in order, which would've allowed us to load ymm0 with a…
	; NODQ-32-NEXT: fstps {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fstps {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fstps {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fstps {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fstps {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fstps {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)			; NODQ-32-NEXT: fildll {{[0-9]+}}(%esp)
	; NODQ-32-NEXT: fstps (%esp)			; NODQ-32-NEXT: fstps (%esp)
	; NODQ-32-NEXT: wait			; NODQ-32-NEXT: wait
				; NODQ-32-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; NODQ-32-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[2,3]
				; NODQ-32-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1],mem[0],xmm0[3]
				; NODQ-32-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1,2],mem[0]
	; NODQ-32-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero			; NODQ-32-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
	; NODQ-32-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0],mem[0],xmm1[2,3]			; NODQ-32-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0],mem[0],xmm1[2,3]
	; NODQ-32-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],mem[0],xmm1[3]			; NODQ-32-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],mem[0],xmm1[3]
	; NODQ-32-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1,2],mem[0]			; NODQ-32-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1,2],mem[0]
	; NODQ-32-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0			; NODQ-32-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
	; NODQ-32-NEXT: movl %ebp, %esp			; NODQ-32-NEXT: movl %ebp, %esp
	; NODQ-32-NEXT: popl %ebp			; NODQ-32-NEXT: popl %ebp
	; NODQ-32-NEXT: .cfi_def_cfa %esp, 4			; NODQ-32-NEXT: .cfi_def_cfa %esp, 4
	▲ Show 20 Lines • Show All 170 Lines • Show Last 20 Lines