Download Raw Diff

Details

Reviewers

icedrocket
RKSimon
lebedev.ri
pengfei

Commits

rG11fb09ec0afa: [X86] Change precision control to FP80 during u64->fp32 conversion on Windows.
rG928a1764d6bd: [X86][WIP] Change precision control to FP80 during u64->fp32 conversion on…

Summary

This is an alternative to D141074 to fix the problem by adjusting
the precision control dynamically.

Posting for early review so we can do some testing of this solution.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

craig.topper created this revision.Jan 19 2023, 10:31 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 19 2023, 10:31 PM

Herald added subscribers: StephenFan, pengfei, hiraditya. · View Herald Transcript

craig.topper requested review of this revision.Jan 19 2023, 10:31 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 19 2023, 10:31 PM

Harbormaster completed remote builds in B208902: Diff 490720.Jan 20 2023, 12:13 AM

This revision was not accepted when it landed; it landed in state Needs Review.Jan 20 2023, 12:34 AM

Closed by commit rG928a1764d6bd: [X86][WIP] Change precision control to FP80 during u64->fp32 conversion on… (authored by craig.topper). · Explain Why

This revision was automatically updated to reflect the committed changes.

craig.topper added a commit: rG928a1764d6bd: [X86][WIP] Change precision control to FP80 during u64->fp32 conversion on….

craig.topper added a reverting change: rGf4fa34c35915: Revert "[X86][WIP] Change precision control to FP80 during u64->fp32 conversion….Jan 20 2023, 12:41 AM

Committed by accident

craig.topper added a reviewer: pengfei.Jan 20 2023, 12:42 AM

I tested the patch and it seems to have fixed the issue.

This looks a good way to solve the problem.

llvm/lib/Target/X86/X86ISelLowering.h
896–897	This should be in the same line.

icedrocket added inline comments.Jan 23 2023, 6:13 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
37331–37354	What about just set value to 0x37f instead of applying bitwise OR to original value? We can save two registers by doing this, and only need two bytes of memory to hold the static value.

icedrocket added a comment.Jan 23 2023, 10:20 AM

This comment was removed by icedrocket.

icedrocket added inline comments.Jan 23 2023, 10:03 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
37331–37354	Ah, there is still the possibility of an exception being thrown due to a stack overflow. What if the original value's rounding mode is not round to nearest?

craig.topper added inline comments.Jan 23 2023, 10:45 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
37331–37354	The compiler is managing the stack so there shouldn't be any overflow unless someone uses assembly or something to add things to the stack that the compiler doesn't know about. Rounding mode doesn't matter because the fadd is not supposed to round. Any integer than can be created should fit perfectly in an 80 bit FP.

icedrocket added inline comments.Jan 24 2023, 12:15 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
37331–37354	Yes, and FADD doesn't seem to cause a stack overflow, because it doesn't push on the stack. I forgot that the modified control word only applies to FADD. So, as long as we only use FP80_ADD on LowerUINT_TO_FP, no exception will be thrown.

Use the load version of fadd when possible. Only handling the case of loading an f32 value.

Harbormaster completed remote builds in B209552: Diff 491634.Jan 24 2023, 1:44 AM

craig.topper retitled this revision from [X86][WIP] Change precision control to FP80 during u64->fp32 conversion on Windows. to [X86] Change precision control to FP80 during u64->fp32 conversion on Windows..Jan 29 2023, 10:14 PM

craig.topper edited the summary of this revision. (Show Details)

Could you update the diff? The current diff is outdated and cannot be applied to main branch automatically.

Rebase

icedrocket added inline comments.Feb 2 2023, 10:05 AM

llvm/lib/Target/X86/X86ISelLowering.h
896–897	Is this comment change intended? Other than this, everything looks good.
llvm/test/CodeGen/X86/uint64-to-float.ll
64	I checked the assembly generated by clang and it seems that fadds is split into fld and fadd.

craig.topper added inline comments.Feb 2 2023, 10:07 AM

llvm/lib/Target/X86/X86ISelLowering.h
896–897	not sure what happened there. I'll fix

Remove unrelated comment change. The line is longer than 80 columns, but isn't near code this is touching.

craig.topper added inline comments.Feb 2 2023, 10:16 AM

llvm/test/CodeGen/X86/uint64-to-float.ll
64	That's weird. Do you have a C file you can share?

icedrocket added inline comments.Feb 2 2023, 10:44 AM

llvm/test/CodeGen/X86/uint64-to-float.ll
64	The file is same as the summary's code in D141074. I think that there is no actual `fadds` instruction in x87 and end up split into two instructions.

Harbormaster completed remote builds in B211531: Diff 494353.Feb 2 2023, 11:15 AM

In D142178#4100370, @craig.topper wrote:

Remove unrelated comment change. The line is longer than 80 columns, but isn't near code this is touching.

clang-format test failed. We need to fix the comment.

In D142178#4100598, @icedrocket wrote:

In D142178#4100370, @craig.topper wrote:

Remove unrelated comment change. The line is longer than 80 columns, but isn't near code this is touching.

clang-format test failed. We need to fix the comment.

I'll fix in a separate pre-commit. It's a distraction for this review.

Rebase after fixing 80 column on trunk

Fix warning about ISD and X86ISD being different enums for a conditional operator.

craig.topper added inline comments.Feb 2 2023, 12:53 PM

llvm/test/CodeGen/X86/uint64-to-float.ll
64	Did you check the assembly without optimizations enabled? Folding the load into the fadd is only done with optimizations enabled.

Harbormaster completed remote builds in B211560: Diff 494408.Feb 2 2023, 1:47 PM

icedrocket added inline comments.Feb 2 2023, 10:14 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
22016	Should we only apply this to the conversion to f32? Conversion to f64 might also have precision issues though I can't prove it.
llvm/test/CodeGen/X86/uint64-to-float.ll
64	You're right. I tested again with another code and it works as you mentioned.

icedrocket added inline comments.Feb 4 2023, 4:04 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
22016	LowerUINT_TO_FP_i64 also uses FP64 addition. If u64 to f64 conversion is problematic because FP80 addition is not used, then implementations using SSE2 will also be problematic. So unless we find a broken case, I think it's better to leave it alone.

I wrote some code to verify that u64 to f64 conversion using FILD+FADD with 53-bit precision is accurate. I tested it with 10^12 cases and the results are: u64 to f32 conversion failed frequently, but u64 to f64 conversion does not failed.

https://reviews.llvm.org/F26366621
https://reviews.llvm.org/F26366622

LGTM

This revision is now accepted and ready to land.Feb 4 2023, 7:03 AM

In D142178#4104248, @icedrocket wrote:

I wrote some code to verify that u64 to f64 conversion using FILD+FADD with 53-bit precision is accurate. I tested it with 10^12 cases and the results are: u64 to f32 conversion failed frequently, but u64 to f64 conversion does not failed.

https://reviews.llvm.org/F26366621
https://reviews.llvm.org/F26366622

The u64->f64 conversion can only fail if PC is set to single precision. The original bug we are fixing is that with PC set to double precision we round from fp80 to fp64 then to fp32. If PC is set to single precision we have a lot more problems with f64.

Closed by commit rG11fb09ec0afa: [X86] Change precision control to FP80 during u64->fp32 conversion on Windows. (authored by craig.topper). · Explain WhyFeb 6 2023, 7:35 AM

This revision was automatically updated to reflect the committed changes.

craig.topper added a commit: rG11fb09ec0afa: [X86] Change precision control to FP80 during u64->fp32 conversion on Windows..

Diff 491634

llvm/lib/Target/X86/X86ISelLowering.h

//===-- X86ISelLowering.h - X86 DAG Lowering Interface ----------- C++ --===//		//===-- X86ISelLowering.h - X86 DAG Lowering Interface ----------- C++ --===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 726 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
ENQCMDS,		ENQCMDS,

// For avx512-vp2intersect		// For avx512-vp2intersect
VP2INTERSECT,		VP2INTERSECT,

// User level interrupts - testui		// User level interrupts - testui
TESTUI,		TESTUI,

		// Perform an FP80 add after changing precision control in FPCW.
		FP80_ADD,

/// X86 strict FP compare instructions.		/// X86 strict FP compare instructions.
STRICT_FCMP = ISD::FIRST_TARGET_STRICTFP_OPCODE,		STRICT_FCMP = ISD::FIRST_TARGET_STRICTFP_OPCODE,
STRICT_FCMPS,		STRICT_FCMPS,

// Vector packed double/float comparison.		// Vector packed double/float comparison.
STRICT_CMPP,		STRICT_CMPP,

/// Vector comparison generating mask bits for fp and		/// Vector comparison generating mask bits for fp and
Show All 23 Lines	enum NodeType : unsigned {
STRICT_FNMADD,		STRICT_FNMADD,
STRICT_FMSUB,		STRICT_FMSUB,
STRICT_FNMSUB,		STRICT_FNMSUB,

// Conversions between float and half-float.		// Conversions between float and half-float.
STRICT_CVTPS2PH,		STRICT_CVTPS2PH,
STRICT_CVTPH2PS,		STRICT_CVTPH2PS,

		// Perform an FP80 add after changing precision control in FPCW.
		STRICT_FP80_ADD,

// WARNING: Only add nodes here if they are strict FP nodes. Non-memory and		// WARNING: Only add nodes here if they are strict FP nodes. Non-memory and
// non-strict FP nodes should be above FIRST_TARGET_STRICTFP_OPCODE.		// non-strict FP nodes should be above FIRST_TARGET_STRICTFP_OPCODE.

// Compare and swap.		// Compare and swap.
LCMPXCHG_DAG = ISD::FIRST_TARGET_MEMORY_OPCODE,		LCMPXCHG_DAG = ISD::FIRST_TARGET_MEMORY_OPCODE,
LCMPXCHG8_DAG,		LCMPXCHG8_DAG,
LCMPXCHG16_DAG,		LCMPXCHG16_DAG,
LCMPXCHG16_SAVE_RBX_DAG,		LCMPXCHG16_SAVE_RBX_DAG,
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
AESENC256KL,		AESENC256KL,
AESDEC256KL,		AESDEC256KL,
AESENCWIDE128KL,		AESENCWIDE128KL,
AESDECWIDE128KL,		AESDECWIDE128KL,
AESENCWIDE256KL,		AESENCWIDE256KL,
AESDECWIDE256KL,		AESDECWIDE256KL,

/// Compare and Add if Condition is Met. Compare value in operand 2 with		/// Compare and Add if Condition is Met. Compare value in operand 2 with
/// value in memory of operand 1. If condition of operand 4 is met, add value		/// value in memory of operand 1. If condition of operand 4 is met, add
		/// value
/// operand 3 to m32 and write new value in operand 1. Operand 2 is		/// operand 3 to m32 and write new value in operand 1. Operand 2 is
		pengfeiUnsubmitted Not Done Reply Inline Actions This should be in the same line. pengfei: This should be in the same line.
		icedrocketUnsubmitted Not Done Reply Inline Actions Is this comment change intended? Other than this, everything looks good. icedrocket: Is this comment change intended? Other than this, everything looks good.
		craig.topperAuthorUnsubmitted Done Reply Inline Actions not sure what happened there. I'll fix craig.topper: not sure what happened there. I'll fix
/// always updated with the original value from operand 1.		/// always updated with the original value from operand 1.
CMPCCXADD,		CMPCCXADD,

// Save xmm argument registers to the stack, according to %al. An operator		// Save xmm argument registers to the stack, according to %al. An operator
// is needed so that this can be expanded with control flow.		// is needed so that this can be expanded with control flow.
VASTART_SAVE_XMM_REGS,		VASTART_SAVE_XMM_REGS,

// WARNING: Do not add anything in the end unless you want the node to		// WARNING: Do not add anything in the end unless you want the node to
▲ Show 20 Lines • Show All 957 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

//===-- X86ISelLowering.cpp - X86 DAG Lowering Implementation -------------===//		//===-- X86ISelLowering.cpp - X86 DAG Lowering Implementation -------------===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 21,986 Lines • ▼ Show 20 Lines	SDValue X86TargetLowering::LowerUINT_TO_FP(SDValue Op,
SDValue Fudge = DAG.getExtLoad(		SDValue Fudge = DAG.getExtLoad(
ISD::EXTLOAD, dl, MVT::f80, Chain, FudgePtr,		ISD::EXTLOAD, dl, MVT::f80, Chain, FudgePtr,
MachinePointerInfo::getConstantPool(DAG.getMachineFunction()), MVT::f32,		MachinePointerInfo::getConstantPool(DAG.getMachineFunction()), MVT::f32,
CPAlignment);		CPAlignment);
Chain = Fudge.getValue(1);		Chain = Fudge.getValue(1);
// Extend everything to 80 bits to force it to be done on x87.		// Extend everything to 80 bits to force it to be done on x87.
// TODO: Are there any fast-math-flags to propagate here?		// TODO: Are there any fast-math-flags to propagate here?
if (IsStrict) {		if (IsStrict) {
SDValue Add = DAG.getNode(ISD::STRICT_FADD, dl, {MVT::f80, MVT::Other},		unsigned Opc = Subtarget.isOSWindows() && DstVT == MVT::f32
{Chain, Fild, Fudge});		? X86ISD::STRICT_FP80_ADD
		: ISD::STRICT_FADD;
		SDValue Add =
		DAG.getNode(Opc, dl, {MVT::f80, MVT::Other}, {Chain, Fild, Fudge});
// STRICT_FP_ROUND can't handle equal types.		// STRICT_FP_ROUND can't handle equal types.
if (DstVT == MVT::f80)		if (DstVT == MVT::f80)
return Add;		return Add;
return DAG.getNode(ISD::STRICT_FP_ROUND, dl, {DstVT, MVT::Other},		return DAG.getNode(ISD::STRICT_FP_ROUND, dl, {DstVT, MVT::Other},
{Add.getValue(1), Add, DAG.getIntPtrConstant(0, dl)});		{Add.getValue(1), Add, DAG.getIntPtrConstant(0, dl)});
}		}
SDValue Add = DAG.getNode(ISD::FADD, dl, MVT::f80, Fild, Fudge);		unsigned Opc = Subtarget.isOSWindows() && DstVT == MVT::f32 ? X86ISD::FP80_ADD
		: ISD::FADD;
		SDValue Add = DAG.getNode(Opc, dl, MVT::f80, Fild, Fudge);
		icedrocketUnsubmitted Not Done Reply Inline Actions Should we only apply this to the conversion to f32? Conversion to f64 might also have precision issues though I can't prove it. icedrocket: Should we only apply this to the conversion to f32? Conversion to f64 might also have precision…
		icedrocketUnsubmitted Done Reply Inline Actions LowerUINT_TO_FP_i64 also uses FP64 addition. If u64 to f64 conversion is problematic because FP80 addition is not used, then implementations using SSE2 will also be problematic. So unless we find a broken case, I think it's better to leave it alone. icedrocket: LowerUINT_TO_FP_i64 also uses FP64 addition. If u64 to f64 conversion is problematic because…
return DAG.getNode(ISD::FP_ROUND, dl, DstVT, Add,		return DAG.getNode(ISD::FP_ROUND, dl, DstVT, Add,
DAG.getIntPtrConstant(0, dl, /isTarget=/true));		DAG.getIntPtrConstant(0, dl, /isTarget=/true));
}		}

// If the given FP_TO_SINT (IsSigned) or FP_TO_UINT (!IsSigned) operation		// If the given FP_TO_SINT (IsSigned) or FP_TO_UINT (!IsSigned) operation
// is legal, or has an fp128 or f16 source (which needs to be promoted to f32),		// is legal, or has an fp128 or f16 source (which needs to be promoted to f32),
// just return an SDValue().		// just return an SDValue().
// Otherwise it is assumed to be a conversion from one of f32, f64 or f80		// Otherwise it is assumed to be a conversion from one of f32, f64 or f80
▲ Show 20 Lines • Show All 12,778 Lines • ▼ Show 20 Lines	#define NODE_NAME_CASE(NODE) case X86ISD::NODE: return "X86ISD::" #NODE;
NODE_NAME_CASE(AESENC256KL)		NODE_NAME_CASE(AESENC256KL)
NODE_NAME_CASE(AESDEC256KL)		NODE_NAME_CASE(AESDEC256KL)
NODE_NAME_CASE(AESENCWIDE128KL)		NODE_NAME_CASE(AESENCWIDE128KL)
NODE_NAME_CASE(AESDECWIDE128KL)		NODE_NAME_CASE(AESDECWIDE128KL)
NODE_NAME_CASE(AESENCWIDE256KL)		NODE_NAME_CASE(AESENCWIDE256KL)
NODE_NAME_CASE(AESDECWIDE256KL)		NODE_NAME_CASE(AESDECWIDE256KL)
NODE_NAME_CASE(CMPCCXADD)		NODE_NAME_CASE(CMPCCXADD)
NODE_NAME_CASE(TESTUI)		NODE_NAME_CASE(TESTUI)
		NODE_NAME_CASE(FP80_ADD)
		NODE_NAME_CASE(STRICT_FP80_ADD)
}		}
return nullptr;		return nullptr;
#undef NODE_NAME_CASE		#undef NODE_NAME_CASE
}		}

/// Return true if the addressing mode represented by AM is legal for this		/// Return true if the addressing mode represented by AM is legal for this
/// target, for a load/store of the specified type.		/// target, for a load/store of the specified type.
bool X86TargetLowering::isLegalAddressingMode(const DataLayout &DL,		bool X86TargetLowering::isLegalAddressingMode(const DataLayout &DL,
▲ Show 20 Lines • Show All 2,494 Lines • ▼ Show 20 Lines	unsigned PopF =
MI.getOpcode() == X86::WRFLAGS32 ? X86::POPF32 : X86::POPF64;		MI.getOpcode() == X86::WRFLAGS32 ? X86::POPF32 : X86::POPF64;
BuildMI(*BB, MI, DL, TII->get(Push)).addReg(MI.getOperand(0).getReg());		BuildMI(*BB, MI, DL, TII->get(Push)).addReg(MI.getOperand(0).getReg());
BuildMI(*BB, MI, DL, TII->get(PopF));		BuildMI(*BB, MI, DL, TII->get(PopF));

MI.eraseFromParent(); // The pseudo is gone now.		MI.eraseFromParent(); // The pseudo is gone now.
return BB;		return BB;
}		}

		case X86::FP80_ADDr:
		case X86::FP80_ADDm32: {
		// Change the floating point control register to use double extended
		// precision when performing the addition.
		int OrigCWFrameIdx =
		MF->getFrameInfo().CreateStackObject(2, Align(2), false);
		addFrameReference(BuildMI(*BB, MI, DL, TII->get(X86::FNSTCW16m)),
		OrigCWFrameIdx);

		// Load the old value of the control word...
		Register OldCW = MF->getRegInfo().createVirtualRegister(&X86::GR32RegClass);
		addFrameReference(BuildMI(*BB, MI, DL, TII->get(X86::MOVZX32rm16), OldCW),
		OrigCWFrameIdx);

		// OR 0b11 into bit 8 and 9. 0b11 is the encoding for double extended
		// precision.
		Register NewCW = MF->getRegInfo().createVirtualRegister(&X86::GR32RegClass);
		BuildMI(*BB, MI, DL, TII->get(X86::OR32ri), NewCW)
		.addReg(OldCW, RegState::Kill)
		.addImm(0x300);

		// Extract to 16 bits.
		Register NewCW16 =
		MF->getRegInfo().createVirtualRegister(&X86::GR16RegClass);
		BuildMI(*BB, MI, DL, TII->get(TargetOpcode::COPY), NewCW16)
		.addReg(NewCW, RegState::Kill, X86::sub_16bit);

		// Prepare memory for FLDCW.
		int NewCWFrameIdx =
		MF->getFrameInfo().CreateStackObject(2, Align(2), false);
		addFrameReference(BuildMI(*BB, MI, DL, TII->get(X86::MOV16mr)),
		NewCWFrameIdx)
		icedrocketUnsubmitted Not Done Reply Inline Actions What about just set value to 0x37f instead of applying bitwise OR to original value? We can save two registers by doing this, and only need two bytes of memory to hold the static value. icedrocket: What about just set value to 0x37f instead of applying bitwise OR to original value? We can…
		icedrocketUnsubmitted Not Done Reply Inline Actions Ah, there is still the possibility of an exception being thrown due to a stack overflow. What if the original value's rounding mode is not round to nearest? icedrocket: Ah, there is still the possibility of an exception being thrown due to a stack overflow. What…
		craig.topperAuthorUnsubmitted Done Reply Inline Actions The compiler is managing the stack so there shouldn't be any overflow unless someone uses assembly or something to add things to the stack that the compiler doesn't know about. Rounding mode doesn't matter because the fadd is not supposed to round. Any integer than can be created should fit perfectly in an 80 bit FP. craig.topper: The compiler is managing the stack so there shouldn't be any overflow unless someone uses…
		icedrocketUnsubmitted Done Reply Inline Actions Yes, and FADD doesn't seem to cause a stack overflow, because it doesn't push on the stack. I forgot that the modified control word only applies to FADD. So, as long as we only use FP80_ADD on LowerUINT_TO_FP, no exception will be thrown. icedrocket: Yes, and FADD doesn't seem to cause a stack overflow, because it doesn't push on the stack. I…
		.addReg(NewCW16, RegState::Kill);

		// Reload the modified control word now...
		addFrameReference(BuildMI(*BB, MI, DL, TII->get(X86::FLDCW16m)),
		NewCWFrameIdx);

		// Do the addition.
		if (MI.getOpcode() == X86::FP80_ADDr) {
		BuildMI(*BB, MI, DL, TII->get(X86::ADD_Fp80))
		.add(MI.getOperand(0))
		.add(MI.getOperand(1))
		.add(MI.getOperand(2));
		} else {
		BuildMI(*BB, MI, DL, TII->get(X86::ADD_Fp80m32))
		.add(MI.getOperand(0))
		.add(MI.getOperand(1))
		.add(MI.getOperand(2))
		.add(MI.getOperand(3))
		.add(MI.getOperand(4))
		.add(MI.getOperand(5))
		.add(MI.getOperand(6));
		}

		// Reload the original control word now.
		addFrameReference(BuildMI(*BB, MI, DL, TII->get(X86::FLDCW16m)),
		OrigCWFrameIdx);

		MI.eraseFromParent(); // The pseudo instruction is gone now.
		return BB;
		}

case X86::FP32_TO_INT16_IN_MEM:		case X86::FP32_TO_INT16_IN_MEM:
case X86::FP32_TO_INT32_IN_MEM:		case X86::FP32_TO_INT32_IN_MEM:
case X86::FP32_TO_INT64_IN_MEM:		case X86::FP32_TO_INT64_IN_MEM:
case X86::FP64_TO_INT16_IN_MEM:		case X86::FP64_TO_INT16_IN_MEM:
case X86::FP64_TO_INT32_IN_MEM:		case X86::FP64_TO_INT32_IN_MEM:
case X86::FP64_TO_INT64_IN_MEM:		case X86::FP64_TO_INT64_IN_MEM:
case X86::FP80_TO_INT16_IN_MEM:		case X86::FP80_TO_INT16_IN_MEM:
case X86::FP80_TO_INT32_IN_MEM:		case X86::FP80_TO_INT32_IN_MEM:
▲ Show 20 Lines • Show All 20,454 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86InstrFPStack.td

Show All 20 Lines
def SDTX86Fst : SDTypeProfile<0, 2, [SDTCisFP<0>,		def SDTX86Fst : SDTypeProfile<0, 2, [SDTCisFP<0>,
SDTCisPtrTy<1>]>;		SDTCisPtrTy<1>]>;
def SDTX86Fild : SDTypeProfile<1, 1, [SDTCisFP<0>, SDTCisPtrTy<1>]>;		def SDTX86Fild : SDTypeProfile<1, 1, [SDTCisFP<0>, SDTCisPtrTy<1>]>;
def SDTX86Fist : SDTypeProfile<0, 2, [SDTCisFP<0>, SDTCisPtrTy<1>]>;		def SDTX86Fist : SDTypeProfile<0, 2, [SDTCisFP<0>, SDTCisPtrTy<1>]>;

def SDTX86CwdStore : SDTypeProfile<0, 1, [SDTCisPtrTy<0>]>;		def SDTX86CwdStore : SDTypeProfile<0, 1, [SDTCisPtrTy<0>]>;
def SDTX86CwdLoad : SDTypeProfile<0, 1, [SDTCisPtrTy<0>]>;		def SDTX86CwdLoad : SDTypeProfile<0, 1, [SDTCisPtrTy<0>]>;

		def X86fp80_add : SDNode<"X86ISD::FP80_ADD", SDTFPBinOp, [SDNPCommutative]>;
		def X86strict_fp80_add : SDNode<"X86ISD::STRICT_FP80_ADD", SDTFPBinOp,
		[SDNPHasChain,SDNPCommutative]>;
		def any_X86fp80_add : PatFrags<(ops node:$lhs, node:$rhs),
		[(X86strict_fp80_add node:$lhs, node:$rhs),
		(X86fp80_add node:$lhs, node:$rhs)]>;

def X86fld : SDNode<"X86ISD::FLD", SDTX86Fld,		def X86fld : SDNode<"X86ISD::FLD", SDTX86Fld,
[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
def X86fst : SDNode<"X86ISD::FST", SDTX86Fst,		def X86fst : SDNode<"X86ISD::FST", SDTX86Fst,
[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
def X86fild : SDNode<"X86ISD::FILD", SDTX86Fild,		def X86fild : SDNode<"X86ISD::FILD", SDTX86Fild,
[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
def X86fist : SDNode<"X86ISD::FIST", SDTX86Fist,		def X86fist : SDNode<"X86ISD::FIST", SDTX86Fist,
[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	let usesCustomInserter = 1, hasNoSchedulingInfo = 1, Defs = [EFLAGS] in {
def FP64_TO_INT64_IN_MEM : PseudoI<(outs), (ins i64mem:$dst, RFP64:$src),		def FP64_TO_INT64_IN_MEM : PseudoI<(outs), (ins i64mem:$dst, RFP64:$src),
[(X86fp_to_i64mem RFP64:$src, addr:$dst)]>;		[(X86fp_to_i64mem RFP64:$src, addr:$dst)]>;
def FP80_TO_INT16_IN_MEM : PseudoI<(outs), (ins i16mem:$dst, RFP80:$src),		def FP80_TO_INT16_IN_MEM : PseudoI<(outs), (ins i16mem:$dst, RFP80:$src),
[(X86fp_to_i16mem RFP80:$src, addr:$dst)]>;		[(X86fp_to_i16mem RFP80:$src, addr:$dst)]>;
def FP80_TO_INT32_IN_MEM : PseudoI<(outs), (ins i32mem:$dst, RFP80:$src),		def FP80_TO_INT32_IN_MEM : PseudoI<(outs), (ins i32mem:$dst, RFP80:$src),
[(X86fp_to_i32mem RFP80:$src, addr:$dst)]>;		[(X86fp_to_i32mem RFP80:$src, addr:$dst)]>;
def FP80_TO_INT64_IN_MEM : PseudoI<(outs), (ins i64mem:$dst, RFP80:$src),		def FP80_TO_INT64_IN_MEM : PseudoI<(outs), (ins i64mem:$dst, RFP80:$src),
[(X86fp_to_i64mem RFP80:$src, addr:$dst)]>;		[(X86fp_to_i64mem RFP80:$src, addr:$dst)]>;

		def FP80_ADDr : PseudoI<(outs RFP80:$dst), (ins RFP80:$src1, RFP80:$src2),
		[(set RFP80:$dst,
		(any_X86fp80_add RFP80:$src1, RFP80:$src2))]>;
		def FP80_ADDm32 : PseudoI<(outs RFP80:$dst), (ins RFP80:$src1, f32mem:$src2),
		[(set RFP80:$dst,
		(any_X86fp80_add RFP80:$src1,
		(f80 (extloadf32 addr:$src2))))]>;
}		}

// All FP Stack operations are represented with four instructions here. The		// All FP Stack operations are represented with four instructions here. The
// first three instructions, generated by the instruction selector, use "RFP32"		// first three instructions, generated by the instruction selector, use "RFP32"
// "RFP64" or "RFP80" registers: traditional register files to reference 32-bit,		// "RFP64" or "RFP80" registers: traditional register files to reference 32-bit,
// 64-bit or 80-bit floating point values. These sizes apply to the values,		// 64-bit or 80-bit floating point values. These sizes apply to the values,
// not the registers, which are always 80 bits; RFP32, RFP64 and RFP80 can be		// not the registers, which are always 80 bits; RFP32, RFP64 and RFP80 can be
// copied to each other without losing information. These instructions are all		// copied to each other without losing information. These instructions are all
▲ Show 20 Lines • Show All 675 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/uint64-to-float.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-apple-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X86			; RUN: llc < %s -mtriple=i686-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X86
	; RUN: llc < %s -mtriple=x86_64-apple-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X64
				; RUN: llc < %s -mtriple=i686-windows -mattr=+sse2 \| FileCheck %s --check-prefix=X86-WIN
				; RUN: llc < %s -mtriple=x86_64-windows -mattr=+sse2 \| FileCheck %s --check-prefix=X64-WIN

	; Verify that we are using the efficient uitofp --> sitofp lowering illustrated			; Verify that we are using the efficient uitofp --> sitofp lowering illustrated
	; by the compiler_rt implementation of __floatundisf.			; by the compiler_rt implementation of __floatundisf.
	; <rdar://problem/8493982>			; <rdar://problem/8493982>

	define float @test(i64 %a) nounwind {			define float @test(i64 %a) nounwind {
	; X86-LABEL: test:			; X86-LABEL: test:
	; X86: # %bb.0: # %entry			; X86: # %bb.0: # %entry
	Show All 25 Lines
	; X64-NEXT: .LBB0_1:			; X64-NEXT: .LBB0_1:
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: shrq %rax			; X64-NEXT: shrq %rax
	; X64-NEXT: andl $1, %edi			; X64-NEXT: andl $1, %edi
	; X64-NEXT: orq %rax, %rdi			; X64-NEXT: orq %rax, %rdi
	; X64-NEXT: cvtsi2ss %rdi, %xmm0			; X64-NEXT: cvtsi2ss %rdi, %xmm0
	; X64-NEXT: addss %xmm0, %xmm0			; X64-NEXT: addss %xmm0, %xmm0
	; X64-NEXT: retq			; X64-NEXT: retq
				;
				; X86-WIN-LABEL: test:
				; X86-WIN: # %bb.0: # %entry
				; X86-WIN-NEXT: pushl %ebp
				; X86-WIN-NEXT: movl %esp, %ebp
				; X86-WIN-NEXT: andl $-8, %esp
				; X86-WIN-NEXT: subl $24, %esp
				; X86-WIN-NEXT: movl 12(%ebp), %eax
				; X86-WIN-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
				; X86-WIN-NEXT: movlps %xmm0, {{[0-9]+}}(%esp)
				; X86-WIN-NEXT: shrl $31, %eax
				; X86-WIN-NEXT: fildll {{[0-9]+}}(%esp)
				; X86-WIN-NEXT: fnstcw {{[0-9]+}}(%esp)
				; X86-WIN-NEXT: movzwl {{[0-9]+}}(%esp), %ecx
				; X86-WIN-NEXT: orl $768, %ecx # imm = 0x300
				; X86-WIN-NEXT: movw %cx, {{[0-9]+}}(%esp)
				; X86-WIN-NEXT: fldcw {{[0-9]+}}(%esp)
				; X86-WIN-NEXT: fadds __real@5f80000000000000(,%eax,4)
				icedrocketUnsubmitted Not Done Reply Inline Actions I checked the assembly generated by clang and it seems that fadds is split into fld and fadd. icedrocket: I checked the assembly generated by clang and it seems that fadds is split into fld and fadd.
				craig.topperAuthorUnsubmitted Done Reply Inline Actions That's weird. Do you have a C file you can share? craig.topper: That's weird. Do you have a C file you can share?
				icedrocketUnsubmitted Done Reply Inline Actions The file is same as the summary's code in D141074. I think that there is no actual `fadds` instruction in x87 and end up split into two instructions. icedrocket: The file is same as the summary's code in D141074. I think that there is no actual `fadds`…
				craig.topperAuthorUnsubmitted Done Reply Inline Actions Did you check the assembly without optimizations enabled? Folding the load into the fadd is only done with optimizations enabled. craig.topper: Did you check the assembly without optimizations enabled? Folding the load into the fadd is…
				icedrocketUnsubmitted Not Done Reply Inline Actions You're right. I tested again with another code and it works as you mentioned. icedrocket: You're right. I tested again with another code and it works as you mentioned.
				; X86-WIN-NEXT: fldcw {{[0-9]+}}(%esp)
				; X86-WIN-NEXT: fstps {{[0-9]+}}(%esp)
				; X86-WIN-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; X86-WIN-NEXT: movss %xmm0, {{[0-9]+}}(%esp)
				; X86-WIN-NEXT: flds {{[0-9]+}}(%esp)
				; X86-WIN-NEXT: movl %ebp, %esp
				; X86-WIN-NEXT: popl %ebp
				; X86-WIN-NEXT: retl
				;
				; X64-WIN-LABEL: test:
				; X64-WIN: # %bb.0: # %entry
				; X64-WIN-NEXT: testq %rcx, %rcx
				; X64-WIN-NEXT: js .LBB0_1
				; X64-WIN-NEXT: # %bb.2: # %entry
				; X64-WIN-NEXT: cvtsi2ss %rcx, %xmm0
				; X64-WIN-NEXT: retq
				; X64-WIN-NEXT: .LBB0_1:
				; X64-WIN-NEXT: movq %rcx, %rax
				; X64-WIN-NEXT: shrq %rax
				; X64-WIN-NEXT: andl $1, %ecx
				; X64-WIN-NEXT: orq %rax, %rcx
				; X64-WIN-NEXT: cvtsi2ss %rcx, %xmm0
				; X64-WIN-NEXT: addss %xmm0, %xmm0
				; X64-WIN-NEXT: retq
	entry:			entry:
	%b = uitofp i64 %a to float			%b = uitofp i64 %a to float
	ret float %b			ret float %b
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Change precision control to FP80 during u64->fp32 conversion on Windows.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 491634

llvm/lib/Target/X86/X86ISelLowering.h

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/lib/Target/X86/X86InstrFPStack.td

llvm/test/CodeGen/X86/uint64-to-float.ll

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Change precision control to FP80 during u64->fp32 conversion on Windows.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 491634

llvm/lib/Target/X86/X86ISelLowering.h

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/lib/Target/X86/X86InstrFPStack.td

llvm/test/CodeGen/X86/uint64-to-float.ll

[X86] Change precision control to FP80 during u64->fp32 conversion on Windows.
ClosedPublic