This is an archive of the discontinued LLVM Phabricator instance.

fix invalid load folding with SSE/AVX FP logical instructions (PR22371)
ClosedPublic

Authored by spatel on Jul 23 2015, 3:11 PM.

Download Raw Diff

Details

Reviewers

qcolombet
chandlerc
hfinkel

Commits

rG453045851898: Merging r243361: --------------------------------------------------------------…
rG8c13e3680d3f: fix invalid load folding with SSE/AVX FP logical instructions (PR22371)
rL243361: fix invalid load folding with SSE/AVX FP logical instructions (PR22371)

Summary

This is a follow-up to the FIXME that was added with D7474 ( http://reviews.llvm.org/rL229531 ).
I thought this load folding bug had been made hard-to-hit, but it turns out to be very easy when targeting 32-bit x86 and causes a miscompile/crash in Wine:
https://bugs.winehq.org/show_bug.cgi?id=38826
https://llvm.org/bugs/show_bug.cgi?id=22371#c25

The quick fix is to simply remove the scalar FP logical instructions from the load folding table in X86InstrInfo, but that causes us to miss load folds that should be possible when lowering fabs, fneg, fcopysign. So the majority of this patch is altering those lowerings to use *vector* FP logical instructions (because that's all x86 gives us anyway). That lets us do the load folding legally.

The test case for PR2656 was actually checking for miscompiled code, so I changed that. I added the latest test case from PR22371 for extra verification. The changes in sse-fcopysign.ll look benign to me; just different scheduling / RA. I'm not sure why we had 'vandps' and now have 'vandpd' in vec_fabs.ll, but those are logically identical.

Diff Detail

Repository: rL LLVM

Event Timeline

spatel updated this revision to Diff 30525.Jul 23 2015, 3:11 PM

spatel retitled this revision from to fix invalid load folding with SSE/AVX FP logical instructions (PR22371).

spatel updated this object.

spatel added reviewers: chandlerc, qcolombet, hfinkel.

spatel added a subscriber: llvm-commits.

hans added a subscriber: hans.Jul 23 2015, 5:05 PM

Is anyone reviewing this? It would be nice to get this PR fixed for 3.7.

LGTM, and looks good for the branch as well.

test/CodeGen/X86/pr2656.ll
9–10 ↗	(On Diff #30525)	I'd specifically call out that we can do a 16-byte constant pool load for the xorps mask used to negate these values, it just isn't folded because it is used twice. Otherwise it's a bit confusing to read the comment followed by this particular example.

This revision is now accepted and ready to land.Jul 27 2015, 3:17 PM

spatel added inline comments.Jul 27 2015, 5:37 PM

test/CodeGen/X86/pr2656.ll
9–10 ↗	(On Diff #30525)	Yes, that is confusing on 2nd look. I'll fix that and get this checked in. Thanks for the prompt review!

Closed by commit rL243361: fix invalid load folding with SSE/AVX FP logical instructions (PR22371) (authored by spatel). · Explain WhyJul 27 2015, 5:49 PM

This revision was automatically updated to reflect the committed changes.

Merged to 3.7 in r243435.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86ISelLowering.cpp

82 lines

X86InstrInfo.cpp

15 lines

X86InstrSSE.td

8 lines

test/

CodeGen/

X86/

pr2656.ll

32 lines

sse-fcopysign.ll

32 lines

vec_fabs.ll

4 lines

Diff 30767

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 12,671 Lines • ▼ Show 20 Lines	static SDValue LowerFABSorFNEG(SDValue Op, SelectionDAG &DAG) {

// If this is a FABS and it has an FNEG user, bail out to fold the combination		// If this is a FABS and it has an FNEG user, bail out to fold the combination
// into an FNABS. We'll lower the FABS after that if it is still in use.		// into an FNABS. We'll lower the FABS after that if it is still in use.
if (IsFABS)		if (IsFABS)
for (SDNode *User : Op->uses())		for (SDNode *User : Op->uses())
if (User->getOpcode() == ISD::FNEG)		if (User->getOpcode() == ISD::FNEG)
return Op;		return Op;

SDValue Op0 = Op.getOperand(0);
bool IsFNABS = !IsFABS && (Op0.getOpcode() == ISD::FABS);

SDLoc dl(Op);		SDLoc dl(Op);
MVT VT = Op.getSimpleValueType();		MVT VT = Op.getSimpleValueType();
// Assume scalar op for initialization; update for vector if needed.
// Note that there are no scalar bitwise logical SSE/AVX instructions, so we
// generate a 16-byte vector constant and logic op even for the scalar case.
// Using a 16-byte mask allows folding the load of the mask with
// the logic op, so it can save (~4 bytes) on code size.
MVT EltVT = VT;
unsigned NumElts = VT == MVT::f64 ? 2 : 4;
// FIXME: Use function attribute "OptimizeForSize" and/or CodeGenOpt::Level to		// FIXME: Use function attribute "OptimizeForSize" and/or CodeGenOpt::Level to
// decide if we should generate a 16-byte constant mask when we only need 4 or		// decide if we should generate a 16-byte constant mask when we only need 4 or
// 8 bytes for the scalar case.		// 8 bytes for the scalar case.

		MVT LogicVT;
		MVT EltVT;
		unsigned NumElts;

if (VT.isVector()) {		if (VT.isVector()) {
		LogicVT = VT;
EltVT = VT.getVectorElementType();		EltVT = VT.getVectorElementType();
NumElts = VT.getVectorNumElements();		NumElts = VT.getVectorNumElements();
		} else {
		// There are no scalar bitwise logical SSE/AVX instructions, so we
		// generate a 16-byte vector constant and logic op even for the scalar case.
		// Using a 16-byte mask allows folding the load of the mask with
		// the logic op, so it can save (~4 bytes) on code size.
		LogicVT = (VT == MVT::f64) ? MVT::v2f64 : MVT::v4f32;
		EltVT = VT;
		NumElts = (VT == MVT::f64) ? 2 : 4;
}		}

unsigned EltBits = EltVT.getSizeInBits();		unsigned EltBits = EltVT.getSizeInBits();
LLVMContext *Context = DAG.getContext();		LLVMContext *Context = DAG.getContext();
// For FABS, mask is 0x7f...; for FNEG, mask is 0x80...		// For FABS, mask is 0x7f...; for FNEG, mask is 0x80...
APInt MaskElt =		APInt MaskElt =
IsFABS ? APInt::getSignedMaxValue(EltBits) : APInt::getSignBit(EltBits);		IsFABS ? APInt::getSignedMaxValue(EltBits) : APInt::getSignBit(EltBits);
Constant C = ConstantInt::get(Context, MaskElt);		Constant C = ConstantInt::get(Context, MaskElt);
C = ConstantVector::getSplat(NumElts, C);		C = ConstantVector::getSplat(NumElts, C);
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
SDValue CPIdx = DAG.getConstantPool(C, TLI.getPointerTy(DAG.getDataLayout()));		SDValue CPIdx = DAG.getConstantPool(C, TLI.getPointerTy(DAG.getDataLayout()));
unsigned Alignment = cast<ConstantPoolSDNode>(CPIdx)->getAlignment();		unsigned Alignment = cast<ConstantPoolSDNode>(CPIdx)->getAlignment();
SDValue Mask = DAG.getLoad(VT, dl, DAG.getEntryNode(), CPIdx,		SDValue Mask = DAG.getLoad(LogicVT, dl, DAG.getEntryNode(), CPIdx,
MachinePointerInfo::getConstantPool(),		MachinePointerInfo::getConstantPool(),
false, false, false, Alignment);		false, false, false, Alignment);

if (VT.isVector()) {		SDValue Op0 = Op.getOperand(0);
// For a vector, cast operands to a vector type, perform the logic op,		bool IsFNABS = !IsFABS && (Op0.getOpcode() == ISD::FABS);
// and cast the result back to the original value type.		unsigned LogicOp =
MVT VecVT = MVT::getVectorVT(MVT::i64, VT.getSizeInBits() / 64);		IsFABS ? X86ISD::FAND : IsFNABS ? X86ISD::FOR : X86ISD::FXOR;
SDValue MaskCasted = DAG.getBitcast(VecVT, Mask);
SDValue Operand = IsFNABS ? DAG.getBitcast(VecVT, Op0.getOperand(0))
: DAG.getBitcast(VecVT, Op0);
unsigned BitOp = IsFABS ? ISD::AND : IsFNABS ? ISD::OR : ISD::XOR;
return DAG.getBitcast(VT,
DAG.getNode(BitOp, dl, VecVT, Operand, MaskCasted));
}

// If not vector, then scalar.
unsigned BitOp = IsFABS ? X86ISD::FAND : IsFNABS ? X86ISD::FOR : X86ISD::FXOR;
SDValue Operand = IsFNABS ? Op0.getOperand(0) : Op0;		SDValue Operand = IsFNABS ? Op0.getOperand(0) : Op0;
return DAG.getNode(BitOp, dl, VT, Operand, Mask);
		if (VT.isVector())
		return DAG.getNode(LogicOp, dl, LogicVT, Operand, Mask);

		// For the scalar case extend to a 128-bit vector, perform the logic op,
		// and extract the scalar result back out.
		Operand = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, LogicVT, Operand);
		SDValue LogicNode = DAG.getNode(LogicOp, dl, LogicVT, Operand, Mask);
		return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, VT, LogicNode,
		DAG.getIntPtrConstant(0, dl));
}		}

static SDValue LowerFCOPYSIGN(SDValue Op, SelectionDAG &DAG) {		static SDValue LowerFCOPYSIGN(SDValue Op, SelectionDAG &DAG) {
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
LLVMContext *Context = DAG.getContext();		LLVMContext *Context = DAG.getContext();
SDValue Op0 = Op.getOperand(0);		SDValue Op0 = Op.getOperand(0);
SDValue Op1 = Op.getOperand(1);		SDValue Op1 = Op.getOperand(1);
SDLoc dl(Op);		SDLoc dl(Op);
Show All 23 Lines	SmallVector<Constant *, 4> CV(
ConstantFP::get(*Context, APFloat(Sem, APInt(SizeInBits, 0))));		ConstantFP::get(*Context, APFloat(Sem, APInt(SizeInBits, 0))));

// First, clear all bits but the sign bit from the second operand (sign).		// First, clear all bits but the sign bit from the second operand (sign).
CV[0] = ConstantFP::get(*Context,		CV[0] = ConstantFP::get(*Context,
APFloat(Sem, APInt::getHighBitsSet(SizeInBits, 1)));		APFloat(Sem, APInt::getHighBitsSet(SizeInBits, 1)));
Constant *C = ConstantVector::get(CV);		Constant *C = ConstantVector::get(CV);
auto PtrVT = TLI.getPointerTy(DAG.getDataLayout());		auto PtrVT = TLI.getPointerTy(DAG.getDataLayout());
SDValue CPIdx = DAG.getConstantPool(C, PtrVT, 16);		SDValue CPIdx = DAG.getConstantPool(C, PtrVT, 16);
SDValue Mask1 = DAG.getLoad(SrcVT, dl, DAG.getEntryNode(), CPIdx,
		// Perform all logic operations as 16-byte vectors because there are no
		// scalar FP logic instructions in SSE. This allows load folding of the
		// constants into the logic instructions.
		MVT LogicVT = (VT == MVT::f64) ? MVT::v2f64 : MVT::v4f32;
		SDValue Mask1 = DAG.getLoad(LogicVT, dl, DAG.getEntryNode(), CPIdx,
MachinePointerInfo::getConstantPool(),		MachinePointerInfo::getConstantPool(),
false, false, false, 16);		false, false, false, 16);
SDValue SignBit = DAG.getNode(X86ISD::FAND, dl, SrcVT, Op1, Mask1);		Op1 = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, LogicVT, Op1);
		SDValue SignBit = DAG.getNode(X86ISD::FAND, dl, LogicVT, Op1, Mask1);

// Next, clear the sign bit from the first operand (magnitude).		// Next, clear the sign bit from the first operand (magnitude).
// If it's a constant, we can clear it here.		// If it's a constant, we can clear it here.
if (ConstantFPSDNode *Op0CN = dyn_cast<ConstantFPSDNode>(Op0)) {		if (ConstantFPSDNode *Op0CN = dyn_cast<ConstantFPSDNode>(Op0)) {
APFloat APF = Op0CN->getValueAPF();		APFloat APF = Op0CN->getValueAPF();
// If the magnitude is a positive zero, the sign bit alone is enough.		// If the magnitude is a positive zero, the sign bit alone is enough.
if (APF.isPosZero())		if (APF.isPosZero())
return SignBit;		return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, SrcVT, SignBit,
		DAG.getIntPtrConstant(0, dl));
APF.clearSign();		APF.clearSign();
CV[0] = ConstantFP::get(*Context, APF);		CV[0] = ConstantFP::get(*Context, APF);
} else {		} else {
CV[0] = ConstantFP::get(		CV[0] = ConstantFP::get(
*Context,		*Context,
APFloat(Sem, APInt::getLowBitsSet(SizeInBits, SizeInBits - 1)));		APFloat(Sem, APInt::getLowBitsSet(SizeInBits, SizeInBits - 1)));
}		}
C = ConstantVector::get(CV);		C = ConstantVector::get(CV);
CPIdx = DAG.getConstantPool(C, PtrVT, 16);		CPIdx = DAG.getConstantPool(C, PtrVT, 16);
SDValue Val = DAG.getLoad(VT, dl, DAG.getEntryNode(), CPIdx,		SDValue Val = DAG.getLoad(LogicVT, dl, DAG.getEntryNode(), CPIdx,
MachinePointerInfo::getConstantPool(),		MachinePointerInfo::getConstantPool(),
false, false, false, 16);		false, false, false, 16);
// If the magnitude operand wasn't a constant, we need to AND out the sign.		// If the magnitude operand wasn't a constant, we need to AND out the sign.
if (!isa<ConstantFPSDNode>(Op0))		if (!isa<ConstantFPSDNode>(Op0)) {
Val = DAG.getNode(X86ISD::FAND, dl, VT, Op0, Val);		Op0 = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, LogicVT, Op0);
		Val = DAG.getNode(X86ISD::FAND, dl, LogicVT, Op0, Val);
		}
// OR the magnitude value with the sign bit.		// OR the magnitude value with the sign bit.
return DAG.getNode(X86ISD::FOR, dl, VT, Val, SignBit);		Val = DAG.getNode(X86ISD::FOR, dl, LogicVT, Val, SignBit);
		return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, SrcVT, Val,
		DAG.getIntPtrConstant(0, dl));
}		}

static SDValue LowerFGETSIGN(SDValue Op, SelectionDAG &DAG) {		static SDValue LowerFGETSIGN(SDValue Op, SelectionDAG &DAG) {
SDValue N0 = Op.getOperand(0);		SDValue N0 = Op.getOperand(0);
SDLoc dl(Op);		SDLoc dl(Op);
MVT VT = Op.getSimpleValueType();		MVT VT = Op.getSimpleValueType();

// Lower ISD::FGETSIGN to (AND (X86ISD::FGETSIGNx86 ...) 1).		// Lower ISD::FGETSIGN to (AND (X86ISD::FGETSIGNx86 ...) 1).
▲ Show 20 Lines • Show All 13,556 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 953 Lines • ▼ Show 20 Lines	static const X86MemoryFoldTableEntry MemoryFoldTable2[] = {
{ X86::DIVPSrr, X86::DIVPSrm, TB_ALIGN_16 },		{ X86::DIVPSrr, X86::DIVPSrm, TB_ALIGN_16 },
{ X86::DIVSDrr, X86::DIVSDrm, 0 },		{ X86::DIVSDrr, X86::DIVSDrm, 0 },
{ X86::DIVSDrr_Int, X86::DIVSDrm_Int, 0 },		{ X86::DIVSDrr_Int, X86::DIVSDrm_Int, 0 },
{ X86::DIVSSrr, X86::DIVSSrm, 0 },		{ X86::DIVSSrr, X86::DIVSSrm, 0 },
{ X86::DIVSSrr_Int, X86::DIVSSrm_Int, 0 },		{ X86::DIVSSrr_Int, X86::DIVSSrm_Int, 0 },
{ X86::DPPDrri, X86::DPPDrmi, TB_ALIGN_16 },		{ X86::DPPDrri, X86::DPPDrmi, TB_ALIGN_16 },
{ X86::DPPSrri, X86::DPPSrmi, TB_ALIGN_16 },		{ X86::DPPSrri, X86::DPPSrmi, TB_ALIGN_16 },

// FIXME: We should not be folding Fs* scalar loads into vector		// Do not fold Fs* scalar logical op loads because there are no scalar
// instructions because the vector instructions require vector-sized		// load variants for these instructions. When folded, the load is required
// loads. Lowering should create vector-sized instructions (the Fv*		// to be 128-bits, so the load size would not match.
// variants below) to allow load folding.
{ X86::FsANDNPDrr, X86::FsANDNPDrm, TB_ALIGN_16 },
{ X86::FsANDNPSrr, X86::FsANDNPSrm, TB_ALIGN_16 },
{ X86::FsANDPDrr, X86::FsANDPDrm, TB_ALIGN_16 },
{ X86::FsANDPSrr, X86::FsANDPSrm, TB_ALIGN_16 },
{ X86::FsORPDrr, X86::FsORPDrm, TB_ALIGN_16 },
{ X86::FsORPSrr, X86::FsORPSrm, TB_ALIGN_16 },
{ X86::FsXORPDrr, X86::FsXORPDrm, TB_ALIGN_16 },
{ X86::FsXORPSrr, X86::FsXORPSrm, TB_ALIGN_16 },

{ X86::FvANDNPDrr, X86::FvANDNPDrm, TB_ALIGN_16 },		{ X86::FvANDNPDrr, X86::FvANDNPDrm, TB_ALIGN_16 },
{ X86::FvANDNPSrr, X86::FvANDNPSrm, TB_ALIGN_16 },		{ X86::FvANDNPSrr, X86::FvANDNPSrm, TB_ALIGN_16 },
{ X86::FvANDPDrr, X86::FvANDPDrm, TB_ALIGN_16 },		{ X86::FvANDPDrr, X86::FvANDPDrm, TB_ALIGN_16 },
{ X86::FvANDPSrr, X86::FvANDPSrm, TB_ALIGN_16 },		{ X86::FvANDPSrr, X86::FvANDPSrm, TB_ALIGN_16 },
{ X86::FvORPDrr, X86::FvORPDrm, TB_ALIGN_16 },		{ X86::FvORPDrr, X86::FvORPDrm, TB_ALIGN_16 },
{ X86::FvORPSrr, X86::FvORPSrm, TB_ALIGN_16 },		{ X86::FvORPSrr, X86::FvORPSrm, TB_ALIGN_16 },
{ X86::FvXORPDrr, X86::FvXORPDrm, TB_ALIGN_16 },		{ X86::FvXORPDrr, X86::FvXORPDrm, TB_ALIGN_16 },
▲ Show 20 Lines • Show All 5,737 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrSSE.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,918 Lines • ▼ Show 20 Lines	multiclass sse12_fp_packed_vector_logical_alias<
let Predicates = [HasAVX, NoVLX] in {		let Predicates = [HasAVX, NoVLX] in {
defm V#NAME#PS : sse12_fp_packed<opc, !strconcat(OpcodeStr, "ps"), OpNode,		defm V#NAME#PS : sse12_fp_packed<opc, !strconcat(OpcodeStr, "ps"), OpNode,
VR128, v4f32, f128mem, loadv4f32, SSEPackedSingle, itins, 0>,		VR128, v4f32, f128mem, loadv4f32, SSEPackedSingle, itins, 0>,
PS, VEX_4V;		PS, VEX_4V;

defm V#NAME#PD : sse12_fp_packed<opc, !strconcat(OpcodeStr, "pd"), OpNode,		defm V#NAME#PD : sse12_fp_packed<opc, !strconcat(OpcodeStr, "pd"), OpNode,
VR128, v2f64, f128mem, loadv2f64, SSEPackedDouble, itins, 0>,		VR128, v2f64, f128mem, loadv2f64, SSEPackedDouble, itins, 0>,
PD, VEX_4V;		PD, VEX_4V;

		defm V#NAME#PSY : sse12_fp_packed<opc, !strconcat(OpcodeStr, "ps"), OpNode,
		VR256, v8f32, f256mem, loadv8f32, SSEPackedSingle, itins, 0>,
		PS, VEX_4V, VEX_L;

		defm V#NAME#PDY : sse12_fp_packed<opc, !strconcat(OpcodeStr, "pd"), OpNode,
		VR256, v4f64, f256mem, loadv4f64, SSEPackedDouble, itins, 0>,
		PD, VEX_4V, VEX_L;
}		}

let Constraints = "$src1 = $dst" in {		let Constraints = "$src1 = $dst" in {
defm PS : sse12_fp_packed<opc, !strconcat(OpcodeStr, "ps"), OpNode, VR128,		defm PS : sse12_fp_packed<opc, !strconcat(OpcodeStr, "ps"), OpNode, VR128,
v4f32, f128mem, memopv4f32, SSEPackedSingle, itins>,		v4f32, f128mem, memopv4f32, SSEPackedSingle, itins>,
PS;		PS;

defm PD : sse12_fp_packed<opc, !strconcat(OpcodeStr, "pd"), OpNode, VR128,		defm PD : sse12_fp_packed<opc, !strconcat(OpcodeStr, "pd"), OpNode, VR128,
▲ Show 20 Lines • Show All 5,973 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/pr2656.ll

	; RUN: llc < %s -march=x86 -mattr=+sse2 \| FileCheck %s			; RUN: llc < %s -march=x86 -mattr=+sse2 \| FileCheck %s
	; PR2656			; PR2656

	; CHECK: {{xorps.*sp}}
	; CHECK-NOT: {{xorps.*sp}}

	target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128"			target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128"
	target triple = "i686-apple-darwin9.4.0"			target triple = "i686-apple-darwin9.4.0"
	%struct.anon = type <{ float, float }>			%struct.anon = type <{ float, float }>
	@.str = internal constant [17 x i8] c"pt: %.0f, %.0f\0A\00\00" ; <[17 x i8]*> [#uses=1]			@.str = internal constant [17 x i8] c"pt: %.0f, %.0f\0A\00\00" ; <[17 x i8]*> [#uses=1]

				; We can not fold either stack load into an 'xor' instruction because that
				; would change what should be a 4-byte load into a 16-byte load.
				; We can fold the 16-byte constant load into either 'xor' instruction,
				; but we do not. It has more than one use, so it gets loaded into a register.

	define void @foo(%struct.anon* byval %p) nounwind {			define void @foo(%struct.anon* byval %p) nounwind {
				; CHECK-LABEL: foo:
				; CHECK: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; CHECK-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
				; CHECK-NEXT: movaps {{.*#+}} xmm2 = [2147483648,2147483648,2147483648,2147483648]
				; CHECK-NEXT: xorps %xmm2, %xmm0
				; CHECK-NEXT: cvtss2sd %xmm0, %xmm0
				; CHECK-NEXT: xorps %xmm2, %xmm1
	entry:			entry:
	%tmp = getelementptr %struct.anon, %struct.anon* %p, i32 0, i32 0 ; <float*> [#uses=1]			%tmp = getelementptr %struct.anon, %struct.anon* %p, i32 0, i32 0 ; <float*> [#uses=1]
	%tmp1 = load float, float* %tmp ; <float> [#uses=1]			%tmp1 = load float, float* %tmp ; <float> [#uses=1]
	%tmp2 = getelementptr %struct.anon, %struct.anon* %p, i32 0, i32 1 ; <float*> [#uses=1]			%tmp2 = getelementptr %struct.anon, %struct.anon* %p, i32 0, i32 1 ; <float*> [#uses=1]
	%tmp3 = load float, float* %tmp2 ; <float> [#uses=1]			%tmp3 = load float, float* %tmp2 ; <float> [#uses=1]
	%neg = fsub float -0.000000e+00, %tmp1 ; <float> [#uses=1]			%neg = fsub float -0.000000e+00, %tmp1 ; <float> [#uses=1]
	%conv = fpext float %neg to double ; <double> [#uses=1]			%conv = fpext float %neg to double ; <double> [#uses=1]
	%neg4 = fsub float -0.000000e+00, %tmp3 ; <float> [#uses=1]			%neg4 = fsub float -0.000000e+00, %tmp3 ; <float> [#uses=1]
	%conv5 = fpext float %neg4 to double ; <double> [#uses=1]			%conv5 = fpext float %neg4 to double ; <double> [#uses=1]
	%call = call i32 (...) @printf( i8* getelementptr ([17 x i8], [17 x i8]* @.str, i32 0, i32 0), double %conv, double %conv5 ) ; <i32> [#uses=0]			%call = call i32 (...) @printf( i8* getelementptr ([17 x i8], [17 x i8]* @.str, i32 0, i32 0), double %conv, double %conv5 ) ; <i32> [#uses=0]
	ret void			ret void
	}			}

	declare i32 @printf(...)			declare i32 @printf(...)

				; We can not fold the load from the stack into the 'and' instruction because
				; that changes an 8-byte load into a 16-byte load (illegal memory access).
				; We can fold the load of the constant because it is a 16-byte vector constant.

				define double @PR22371(double %x) {
				; CHECK-LABEL: PR22371:
				; CHECK: movsd 16(%esp), %xmm0
				; CHECK-NEXT: andpd LCPI1_0, %xmm0
				; CHECK-NEXT: movlpd %xmm0, (%esp)
				%call = tail call double @fabs(double %x) #0
				ret double %call
				}

				declare double @fabs(double) #0
				attributes #0 = { readnone }

llvm/trunk/test/CodeGen/X86/sse-fcopysign.ll

	Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	declare double @copysign(double, double)			declare double @copysign(double, double)

	;			;
	; LLVM Intrinsic			; LLVM Intrinsic
	;			;

	define float @int1(float %a, float %b) {			define float @int1(float %a, float %b) {
	; X32-LABEL: @int1			; X32-LABEL: @int1
	; X32: movss 12(%esp), %xmm0 {{.*#+}} xmm0 = mem[0],zero,zero,zero			; X32: movss 8(%esp), %xmm0 {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; X32-NEXT: movss 8(%esp), %xmm1 {{.*#+}} xmm1 = mem[0],zero,zero,zero			; X32-NEXT: andps .LCPI2_0, %xmm0
	; X32-NEXT: andps .LCPI2_0, %xmm1			; X32-NEXT: movss 12(%esp), %xmm1 {{.*#+}} xmm1 = mem[0],zero,zero,zero
	; X32-NEXT: andps .LCPI2_1, %xmm0			; X32-NEXT: andps .LCPI2_1, %xmm1
	; X32-NEXT: orps %xmm1, %xmm0			; X32-NEXT: orps %xmm0, %xmm1
	; X32-NEXT: movss %xmm0, (%esp)			; X32-NEXT: movss %xmm1, (%esp)
	; X32-NEXT: flds (%esp)			; X32-NEXT: flds (%esp)
	; X32-NEXT: popl %eax			; X32-NEXT: popl %eax
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: @int1			; X64-LABEL: @int1
	; X64: andps .LCPI2_0(%rip), %xmm0			; X64: andps .LCPI2_0(%rip), %xmm0
	; X64-NEXT: andps .LCPI2_1(%rip), %xmm1			; X64-NEXT: andps .LCPI2_1(%rip), %xmm1
	; X64-NEXT: orps %xmm1, %xmm0			; X64-NEXT: orps %xmm1, %xmm0
	; X64-NEXT: retq			; X64-NEXT: retq
	%tmp = tail call float @llvm.copysign.f32( float %b, float %a )			%tmp = tail call float @llvm.copysign.f32( float %b, float %a )
	ret float %tmp			ret float %tmp
	}			}

	define double @int2(double %a, float %b, float %c) {			define double @int2(double %a, float %b, float %c) {
	; X32-LABEL: @int2			; X32-LABEL: @int2
	; X32: movsd 8(%ebp), %xmm0 {{.*#+}} xmm0 = mem[0],zero			; X32: movss 16(%ebp), %xmm0 {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; X32-NEXT: movss 16(%ebp), %xmm1 {{.*#+}} xmm1 = mem[0],zero,zero,zero			; X32-NEXT: addss 20(%ebp), %xmm0
	; X32-NEXT: addss 20(%ebp), %xmm1			; X32-NEXT: movsd 8(%ebp), %xmm1 {{.*#+}} xmm1 = mem[0],zero
	; X32-NEXT: andpd .LCPI3_0, %xmm0			; X32-NEXT: andpd .LCPI3_0, %xmm1
	; X32-NEXT: cvtss2sd %xmm1, %xmm1			; X32-NEXT: cvtss2sd %xmm0, %xmm0
	; X32-NEXT: andpd .LCPI3_1, %xmm1			; X32-NEXT: andpd .LCPI3_1, %xmm0
	; X32-NEXT: orpd %xmm0, %xmm1			; X32-NEXT: orpd %xmm1, %xmm0
	; X32-NEXT: movsd %xmm1, (%esp)			; X32-NEXT: movlpd %xmm0, (%esp)
	; X32-NEXT: fldl (%esp)			; X32-NEXT: fldl (%esp)
	; X32-NEXT: movl %ebp, %esp			; X32-NEXT: movl %ebp, %esp
	; X32-NEXT: popl %ebp			; X32-NEXT: popl %ebp
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: @int2			; X64-LABEL: @int2
	; X64: addss %xmm2, %xmm1			; X64: addss %xmm2, %xmm1
	; X64-NEXT: andpd .LCPI3_0(%rip), %xmm0
	; X64-NEXT: cvtss2sd %xmm1, %xmm1			; X64-NEXT: cvtss2sd %xmm1, %xmm1
	; X64-NEXT: andpd .LCPI3_1(%rip), %xmm1			; X64-NEXT: andpd .LCPI3_0(%rip), %xmm1
				; X64-NEXT: andpd .LCPI3_1(%rip), %xmm0
	; X64-NEXT: orpd %xmm1, %xmm0			; X64-NEXT: orpd %xmm1, %xmm0
	; X64-NEXT: retq			; X64-NEXT: retq
	%tmp1 = fadd float %b, %c			%tmp1 = fadd float %b, %c
	%tmp2 = fpext float %tmp1 to double			%tmp2 = fpext float %tmp1 to double
	%tmp = tail call double @llvm.copysign.f64( double %a, double %tmp2 )			%tmp = tail call double @llvm.copysign.f64( double %a, double %tmp2 )
	ret double %tmp			ret double %tmp
	}			}

	Show All 30 Lines

llvm/trunk/test/CodeGen/X86/vec_fabs.ll

	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx \| FileCheck %s


	define <2 x double> @fabs_v2f64(<2 x double> %p)			define <2 x double> @fabs_v2f64(<2 x double> %p)
	{			{
	; CHECK-LABEL: fabs_v2f64			; CHECK-LABEL: fabs_v2f64
	; CHECK: vandps			; CHECK: vandpd
	%t = call <2 x double> @llvm.fabs.v2f64(<2 x double> %p)			%t = call <2 x double> @llvm.fabs.v2f64(<2 x double> %p)
	ret <2 x double> %t			ret <2 x double> %t
	}			}
	declare <2 x double> @llvm.fabs.v2f64(<2 x double> %p)			declare <2 x double> @llvm.fabs.v2f64(<2 x double> %p)

	define <4 x float> @fabs_v4f32(<4 x float> %p)			define <4 x float> @fabs_v4f32(<4 x float> %p)
	{			{
	; CHECK-LABEL: fabs_v4f32			; CHECK-LABEL: fabs_v4f32
	; CHECK: vandps			; CHECK: vandps
	%t = call <4 x float> @llvm.fabs.v4f32(<4 x float> %p)			%t = call <4 x float> @llvm.fabs.v4f32(<4 x float> %p)
	ret <4 x float> %t			ret <4 x float> %t
	}			}
	declare <4 x float> @llvm.fabs.v4f32(<4 x float> %p)			declare <4 x float> @llvm.fabs.v4f32(<4 x float> %p)

	define <4 x double> @fabs_v4f64(<4 x double> %p)			define <4 x double> @fabs_v4f64(<4 x double> %p)
	{			{
	; CHECK-LABEL: fabs_v4f64			; CHECK-LABEL: fabs_v4f64
	; CHECK: vandps			; CHECK: vandpd
	%t = call <4 x double> @llvm.fabs.v4f64(<4 x double> %p)			%t = call <4 x double> @llvm.fabs.v4f64(<4 x double> %p)
	ret <4 x double> %t			ret <4 x double> %t
	}			}
	declare <4 x double> @llvm.fabs.v4f64(<4 x double> %p)			declare <4 x double> @llvm.fabs.v4f64(<4 x double> %p)

	define <8 x float> @fabs_v8f32(<8 x float> %p)			define <8 x float> @fabs_v8f32(<8 x float> %p)
	{			{
	; CHECK-LABEL: fabs_v8f32			; CHECK-LABEL: fabs_v8f32
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines