This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
-
MCTargetDesc/
-
PPCMCCodeEmitter.cpp
-
PPCISelDAGToDAG.cpp
-
PPCISelLowering.h
-
PPCISelLowering.cpp
-
PPCInstrInfo.td
-
PPCInstrVSX.td
-
PPCRegisterInfo.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
PR33671.ll
-
build-vector-tests.ll
-
ppc64-i128-abi.ll
-
swaps-le-6.ll
-
vsx-p9.ll

Differential D35007

[PowerPC] Do not emit displacements for DQ-Form instructions that aren't multiples of 16
ClosedPublic

Authored by nemanjai on Jul 5 2017, 4:43 AM.

Download Raw Diff

Details

Reviewers

hfinkel
echristo
kbarton
syzaara
sfertile
stefanp
lei
jtony
gyiu

Commits

rG3c7e276d2478: [PowerPC] Ensure displacements for DQ-Form instructions are multiples of 16
rL307934: [PowerPC] Ensure displacements for DQ-Form instructions are multiples of 16

Summary

The PowerISA 3.0 defines some instructions (such as LXV and STXV) that take a displacement as a quad-word offset (i.e. the effective address is calculated by shifting the value left by 4). As a convenience and consistency with DS-Form instructions, the assembler takes a byte offset. So it is meaningless for a byte offset to not be a multiple of 16.
The assembler already complains when assembling these instructions with an incorrect displacement, we just need to make sure we don't emit them this way.

This patch also fixes https://bugs.llvm.org/show_bug.cgi?id=33671.

Diff Detail

Repository: rL LLVM

Event Timeline

nemanjai created this revision.Jul 5 2017, 4:43 AM

nemanjai added inline comments.Jul 5 2017, 4:46 AM

lib/Target/PowerPC/PPCInstrInfo.td
408 ↗	(On Diff #105250)	If this solution is the way we want to proceed, perhaps it would be good to use the same approach for DS-Form instructions. These ultimately don't need to be aligned, they just can't have a displacement that isn't a multiple of 4.

hfinkel added inline comments.Jul 5 2017, 7:35 AM

lib/Target/PowerPC/PPCInstrInfo.td
408 ↗	(On Diff #105250)	Why can't we just check `cast<LoadSDNode>(N)->getAlignment() >= 16;` like we do above?

nemanjai added inline comments.Jul 5 2017, 7:52 AM

lib/Target/PowerPC/PPCInstrInfo.td
408 ↗	(On Diff #105250)	But we are not actually interested in alignment. These instructions do not have an alignment requirement. The only issue is that the DQ field encodes a quad-word offset (and what we're working with here is a byte offset). I can certainly see that the names are misleading and would be happy to change them. The only reason I didn't is that it seems like we might want to have some consistency with the above fragments. The test case I added illustrates the intent. It comes from this: void test1(int arr, int arrTo) { vector int ptr = (vector int )(&arrTo[4]); ptr = (vector int)(&arr[4]); } void test2(int arr, int arrTo) { vector int ptr = (vector int )(&arrTo[1]); ptr = (vector int)(&arr[2]); } The `test1` function can use the DQ-Form instructions. The `test2` function cannot. And the alignment on the loads/stores is the same.

hfinkel added inline comments.Jul 6 2017, 6:25 PM

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
3022 ↗	(On Diff #105250)	Please write this as: SDValue AddrOp = cast<MemSDNode>(N)->getBasePtr();
lib/Target/PowerPC/PPCInstrInfo.td
408 ↗	(On Diff #105250)	I can certainly see that the names are misleading and would be happy to change them. Yes, please do.

As it turns out, I was wrong in my assessment of Hal's comment. Although the instruction doesn't require any special alignment, allowing it to be used with weaker alignment allows other passes to modify the offset after ISEL. So the fact that we verify that Offset % 16 during ISEL isn't sufficient.
Furthermore, when we eliminate the FrameIndex the code does not have any awareness of instructions that need an immediate that is a multiple of 16.
I've made the following updates:

Don't emit LXV/STXV for unaligned loads/stores (use the X-Forms for that)
Teach FI elimination that some instructions need immediates that are multiples of 16
Fix the test cases that changed behaviour
Assert if we somehow end up with an instruction that needs an immediate as multiple of 16 but has an incorrect immediate
Specify alignment to the function that generates the displacements during ISEL
Add missing DS-Form instructions to the list of instructions that need a multiple-of-4 immediate
Fix loads/stores that were using incorrect addressing
Divide up the loads/stores into those that can handle unaligned addresses and those that can't

nemanjai added reviewers: sfertile, stefanp, lei, jtony, gyiu.Jul 7 2017, 9:33 AM

jtony added inline comments.Jul 7 2017, 9:59 AM

lib/Target/PowerPC/PPCISelLowering.cpp
2143 ↗	(On Diff #105651)	Minor nit, the variable "imm" should be Imm, we can fix it in this patch.
2167 ↗	(On Diff #105651)	Same here.
2194 ↗	(On Diff #105651)	This one is already correct.
lib/Target/PowerPC/PPCRegisterInfo.cpp
895 ↗	(On Diff #105651)	noImmForm --> NoImmForm
test/CodeGen/PowerPC/PR33671.ll
1 ↗	(On Diff #105651)	These Attrs are unnecessary, right?

Although the instruction doesn't require any special alignment, allowing it to be used with weaker alignment allows other passes to modify the offset after ISEL.

For what other passes is this true (aside from places dealing with frame indices, which it seems like you're fixing regardless)?

lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp
274 ↗	(On Diff #105651)	If you use the builtin assembler directly, can a user hit this assert? If so, we should put an actual diagnostic somewhere.

In D35007#805153, @hfinkel wrote:

Although the instruction doesn't require any special alignment, allowing it to be used with weaker alignment allows other passes to modify the offset after ISEL.

For what other passes is this true (aside from places dealing with frame indices, which it seems like you're fixing regardless)?

I actually don't know that. I thought I had seen that before I fixed all the FrameIndex stuff, but looking back on it - it was also FI related. Do you think I should go back to the original way of testing for the displacement operand, but keep the FI fixes?

lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp
274 ↗	(On Diff #105651)	No, the internal assembler already handles the diagnostic due to the definition of the instruction due to the definition of the operand being correct: `isS16ImmX16` is already defined to test for whether the operand is a multiple of 16.

In D35007#805378, @nemanjai wrote:

In D35007#805153, @hfinkel wrote:

Although the instruction doesn't require any special alignment, allowing it to be used with weaker alignment allows other passes to modify the offset after ISEL.

For what other passes is this true (aside from places dealing with frame indices, which it seems like you're fixing regardless)?

I actually don't know that. I thought I had seen that before I fixed all the FrameIndex stuff, but looking back on it - it was also FI related. Do you think I should go back to the original way of testing for the displacement operand, but keep the FI fixes?

Yes. (although you should still change the name of the tablegen predicates to something that does not imply that we're checking alignments).

Kept the fixes to the FrameIndex issues but reverted the predicate for selecting the DQ-Form instructions to use the displacement being a multiple of 16.

As it turns out, the problem reported in the bug is much more serious and prevalent - affecting P9 code generation for most applications. This is really something we should get into the 5.0 release as we don't want to ship a release with this bug. We branch the release next Wednesday so I'm really hoping we can get this reviewed and committed by Friday of this week to give it a couple of days in trunk before we branch.

@hfinkel @echristo @kbarton I know I'm asking a lot, but would you please give this patch your prompt attention due to the prevalence of the problem and the imminence of the release branch.

LGTM

This revision is now accepted and ready to land.Jul 13 2017, 6:32 AM

Closed by commit rL307934: [PowerPC] Ensure displacements for DQ-Form instructions are multiples of 16 (authored by nemanjai). · Explain WhyJul 13 2017, 11:17 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

PowerPC/

MCTargetDesc/

3 lines

28 lines

2 lines

18 lines

22 lines

90 lines

30 lines

test/

CodeGen/

PowerPC/

PR33671.ll

32 lines

build-vector-tests.ll

40 lines

ppc64-i128-abi.ll

6 lines

swaps-le-6.ll

8 lines

vsx-p9.ll

48 lines

Diff 106477

llvm/trunk/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp

Show First 20 Lines • Show All 265 Lines • ▼ Show 20 Lines	unsigned PPCMCCodeEmitter::getMemRIX16Encoding(const MCInst &MI, unsigned OpNo,
SmallVectorImpl<MCFixup> &Fixups,		SmallVectorImpl<MCFixup> &Fixups,
const MCSubtargetInfo &STI) const {		const MCSubtargetInfo &STI) const {
// Encode (imm, reg) as a memrix16, which has the low 12-bits as the		// Encode (imm, reg) as a memrix16, which has the low 12-bits as the
// displacement and the next 5 bits as the register #.		// displacement and the next 5 bits as the register #.
assert(MI.getOperand(OpNo+1).isReg());		assert(MI.getOperand(OpNo+1).isReg());
unsigned RegBits = getMachineOpValue(MI, MI.getOperand(OpNo+1), Fixups, STI) << 12;		unsigned RegBits = getMachineOpValue(MI, MI.getOperand(OpNo+1), Fixups, STI) << 12;

const MCOperand &MO = MI.getOperand(OpNo);		const MCOperand &MO = MI.getOperand(OpNo);
assert(MO.isImm());		assert(MO.isImm() && !(MO.getImm() % 16) &&
		"Expecting an immediate that is a multiple of 16");

return ((getMachineOpValue(MI, MO, Fixups, STI) >> 4) & 0xFFF) \| RegBits;		return ((getMachineOpValue(MI, MO, Fixups, STI) >> 4) & 0xFFF) \| RegBits;
}		}

unsigned PPCMCCodeEmitter::getSPE8DisEncoding(const MCInst &MI, unsigned OpNo,		unsigned PPCMCCodeEmitter::getSPE8DisEncoding(const MCInst &MI, unsigned OpNo,
SmallVectorImpl<MCFixup> &Fixups,		SmallVectorImpl<MCFixup> &Fixups,
const MCSubtargetInfo &STI)		const MCSubtargetInfo &STI)
const {		const {
▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/PowerPC/PPCISelDAGToDAG.cpp

Show First 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	public:
/// specified condition code, returning the CR# of the expression.		/// specified condition code, returning the CR# of the expression.
SDValue SelectCC(SDValue LHS, SDValue RHS, ISD::CondCode CC,		SDValue SelectCC(SDValue LHS, SDValue RHS, ISD::CondCode CC,
const SDLoc &dl);		const SDLoc &dl);

/// SelectAddrImm - Returns true if the address N can be represented by		/// SelectAddrImm - Returns true if the address N can be represented by
/// a base register plus a signed 16-bit displacement [r+imm].		/// a base register plus a signed 16-bit displacement [r+imm].
bool SelectAddrImm(SDValue N, SDValue &Disp,		bool SelectAddrImm(SDValue N, SDValue &Disp,
SDValue &Base) {		SDValue &Base) {
return PPCLowering->SelectAddressRegImm(N, Disp, Base, *CurDAG, false);		return PPCLowering->SelectAddressRegImm(N, Disp, Base, *CurDAG, 0);
}		}

/// SelectAddrImmOffs - Return true if the operand is valid for a preinc		/// SelectAddrImmOffs - Return true if the operand is valid for a preinc
/// immediate field. Note that the operand at this point is already the		/// immediate field. Note that the operand at this point is already the
/// result of a prior SelectAddressRegImm call.		/// result of a prior SelectAddressRegImm call.
bool SelectAddrImmOffs(SDValue N, SDValue &Out) const {		bool SelectAddrImmOffs(SDValue N, SDValue &Out) const {
if (N.getOpcode() == ISD::TargetConstant \|\|		if (N.getOpcode() == ISD::TargetConstant \|\|
N.getOpcode() == ISD::TargetGlobalAddress) {		N.getOpcode() == ISD::TargetGlobalAddress) {
Show All 16 Lines	public:
bool SelectAddrIdxOnly(SDValue N, SDValue &Base, SDValue &Index) {		bool SelectAddrIdxOnly(SDValue N, SDValue &Base, SDValue &Index) {
return PPCLowering->SelectAddressRegRegOnly(N, Base, Index, *CurDAG);		return PPCLowering->SelectAddressRegRegOnly(N, Base, Index, *CurDAG);
}		}

/// SelectAddrImmX4 - Returns true if the address N can be represented by		/// SelectAddrImmX4 - Returns true if the address N can be represented by
/// a base register plus a signed 16-bit displacement that is a multiple of 4.		/// a base register plus a signed 16-bit displacement that is a multiple of 4.
/// Suitable for use by STD and friends.		/// Suitable for use by STD and friends.
bool SelectAddrImmX4(SDValue N, SDValue &Disp, SDValue &Base) {		bool SelectAddrImmX4(SDValue N, SDValue &Disp, SDValue &Base) {
return PPCLowering->SelectAddressRegImm(N, Disp, Base, *CurDAG, true);		return PPCLowering->SelectAddressRegImm(N, Disp, Base, *CurDAG, 4);
		}

		bool SelectAddrImmX16(SDValue N, SDValue &Disp, SDValue &Base) {
		return PPCLowering->SelectAddressRegImm(N, Disp, Base, *CurDAG, 16);
}		}

// Select an address into a single register.		// Select an address into a single register.
bool SelectAddr(SDValue N, SDValue &Base) {		bool SelectAddr(SDValue N, SDValue &Base) {
Base = N;		Base = N;
return true;		return true;
}		}

▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	private:
void PeepholeCROps();		void PeepholeCROps();

SDValue combineToCMPB(SDNode *N);		SDValue combineToCMPB(SDNode *N);
void foldBoolExts(SDValue &Res, SDNode *&N);		void foldBoolExts(SDValue &Res, SDNode *&N);

bool AllUsersSelectZero(SDNode *N);		bool AllUsersSelectZero(SDNode *N);
void SwapAllSelectUsers(SDNode *N);		void SwapAllSelectUsers(SDNode *N);

		bool isOffsetMultipleOf(SDNode *N, unsigned Val) const;
void transferMemOperands(SDNode N, SDNode Result);		void transferMemOperands(SDNode N, SDNode Result);
};		};

} // end anonymous namespace		} // end anonymous namespace

/// InsertVRSaveCode - Once the entire function has been instruction selected,		/// InsertVRSaveCode - Once the entire function has been instruction selected,
/// all virtual registers are created and all machine instructions are built,		/// all virtual registers are created and all machine instructions are built,
/// check to see if we need to save/restore VRSAVE. If so, do it.		/// check to see if we need to save/restore VRSAVE. If so, do it.
▲ Show 20 Lines • Show All 2,678 Lines • ▼ Show 20 Lines	if (IsSext && Inputs32Bit)
return get32BitSExtCompare(LHS, RHS, CC, RHSValue, dl);		return get32BitSExtCompare(LHS, RHS, CC, RHSValue, dl);
else if (Inputs32Bit)		else if (Inputs32Bit)
return get32BitZExtCompare(LHS, RHS, CC, RHSValue, dl);		return get32BitZExtCompare(LHS, RHS, CC, RHSValue, dl);
else if (IsSext)		else if (IsSext)
return get64BitSExtCompare(LHS, RHS, CC, RHSValue, dl);		return get64BitSExtCompare(LHS, RHS, CC, RHSValue, dl);
return get64BitZExtCompare(LHS, RHS, CC, RHSValue, dl);		return get64BitZExtCompare(LHS, RHS, CC, RHSValue, dl);
}		}

		/// Does this node represent a load/store node whose address can be represented
		/// with a register plus an immediate that's a multiple of \p Val:
		bool PPCDAGToDAGISel::isOffsetMultipleOf(SDNode *N, unsigned Val) const {
		LoadSDNode *LDN = dyn_cast<LoadSDNode>(N);
		StoreSDNode *STN = dyn_cast<StoreSDNode>(N);
		SDValue AddrOp;
		if (LDN)
		AddrOp = LDN->getOperand(1);
		else if (STN)
		AddrOp = STN->getOperand(2);

		short Imm = 0;
		if (AddrOp.getOpcode() == ISD::ADD)
		return isIntS16Immediate(AddrOp.getOperand(1), Imm) && !(Imm % Val);

		// If the address comes from the outside, the offset will be zero.
		return AddrOp.getOpcode() == ISD::CopyFromReg;
		}

void PPCDAGToDAGISel::transferMemOperands(SDNode N, SDNode Result) {		void PPCDAGToDAGISel::transferMemOperands(SDNode N, SDNode Result) {
// Transfer memoperands.		// Transfer memoperands.
MachineSDNode::mmo_iterator MemOp = MF->allocateMemRefsArray(1);		MachineSDNode::mmo_iterator MemOp = MF->allocateMemRefsArray(1);
MemOp[0] = cast<MemSDNode>(N)->getMemOperand();		MemOp[0] = cast<MemSDNode>(N)->getMemOperand();
cast<MachineSDNode>(Result)->setMemRefs(MemOp, MemOp + 1);		cast<MachineSDNode>(Result)->setMemRefs(MemOp, MemOp + 1);
}		}

// Select - Convert the specified operand from a target-independent to a		// Select - Convert the specified operand from a target-independent to a
▲ Show 20 Lines • Show All 2,102 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 610 Lines • ▼ Show 20 Lines	public:
bool SelectAddressRegReg(SDValue N, SDValue &Base, SDValue &Index,		bool SelectAddressRegReg(SDValue N, SDValue &Base, SDValue &Index,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;

/// SelectAddressRegImm - Returns true if the address N can be represented		/// SelectAddressRegImm - Returns true if the address N can be represented
/// by a base register plus a signed 16-bit displacement [r+imm], and if it		/// by a base register plus a signed 16-bit displacement [r+imm], and if it
/// is not better represented as reg+reg. If Aligned is true, only accept		/// is not better represented as reg+reg. If Aligned is true, only accept
/// displacements suitable for STD and friends, i.e. multiples of 4.		/// displacements suitable for STD and friends, i.e. multiples of 4.
bool SelectAddressRegImm(SDValue N, SDValue &Disp, SDValue &Base,		bool SelectAddressRegImm(SDValue N, SDValue &Disp, SDValue &Base,
SelectionDAG &DAG, bool Aligned) const;		SelectionDAG &DAG, unsigned Alignment) const;

/// SelectAddressRegRegOnly - Given the specified addressed, force it to be		/// SelectAddressRegRegOnly - Given the specified addressed, force it to be
/// represented as an indexed [r+r] operation.		/// represented as an indexed [r+r] operation.
bool SelectAddressRegRegOnly(SDValue N, SDValue &Base, SDValue &Index,		bool SelectAddressRegRegOnly(SDValue N, SDValue &Base, SDValue &Index,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;

Sched::Preference getSchedulingPreference(SDNode *N) const override;		Sched::Preference getSchedulingPreference(SDNode *N) const override;

▲ Show 20 Lines • Show All 477 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,124 Lines • ▼ Show 20 Lines	if (Align >= 4)
return;		return;

PPCFunctionInfo *FuncInfo = MF.getInfo<PPCFunctionInfo>();		PPCFunctionInfo *FuncInfo = MF.getInfo<PPCFunctionInfo>();
FuncInfo->setHasNonRISpills();		FuncInfo->setHasNonRISpills();
}		}

/// Returns true if the address N can be represented by a base register plus		/// Returns true if the address N can be represented by a base register plus
/// a signed 16-bit displacement [r+imm], and if it is not better		/// a signed 16-bit displacement [r+imm], and if it is not better
/// represented as reg+reg. If Aligned is true, only accept displacements		/// represented as reg+reg. If \p Alignment is non-zero, only accept
/// suitable for STD and friends, i.e. multiples of 4.		/// displacements that are multiples of that value.
bool PPCTargetLowering::SelectAddressRegImm(SDValue N, SDValue &Disp,		bool PPCTargetLowering::SelectAddressRegImm(SDValue N, SDValue &Disp,
SDValue &Base,		SDValue &Base,
SelectionDAG &DAG,		SelectionDAG &DAG,
bool Aligned) const {		unsigned Alignment) const {
// FIXME dl should come from parent load or store, not from address		// FIXME dl should come from parent load or store, not from address
SDLoc dl(N);		SDLoc dl(N);
// If this can be more profitably realized as r+r, fail.		// If this can be more profitably realized as r+r, fail.
if (SelectAddressRegReg(N, Disp, Base, DAG))		if (SelectAddressRegReg(N, Disp, Base, DAG))
return false;		return false;

if (N.getOpcode() == ISD::ADD) {		if (N.getOpcode() == ISD::ADD) {
int16_t imm = 0;		int16_t imm = 0;
if (isIntS16Immediate(N.getOperand(1), imm) &&		if (isIntS16Immediate(N.getOperand(1), imm) &&
(!Aligned \|\| (imm & 3) == 0)) {		(!Alignment \|\| (imm % Alignment) == 0)) {
Disp = DAG.getTargetConstant(imm, dl, N.getValueType());		Disp = DAG.getTargetConstant(imm, dl, N.getValueType());
if (FrameIndexSDNode *FI = dyn_cast<FrameIndexSDNode>(N.getOperand(0))) {		if (FrameIndexSDNode *FI = dyn_cast<FrameIndexSDNode>(N.getOperand(0))) {
Base = DAG.getTargetFrameIndex(FI->getIndex(), N.getValueType());		Base = DAG.getTargetFrameIndex(FI->getIndex(), N.getValueType());
fixupFuncForFI(DAG, FI->getIndex(), N.getValueType());		fixupFuncForFI(DAG, FI->getIndex(), N.getValueType());
} else {		} else {
Base = N.getOperand(0);		Base = N.getOperand(0);
}		}
return true; // [r+i]		return true; // [r+i]
} else if (N.getOperand(1).getOpcode() == PPCISD::Lo) {		} else if (N.getOperand(1).getOpcode() == PPCISD::Lo) {
// Match LOAD (ADD (X, Lo(G))).		// Match LOAD (ADD (X, Lo(G))).
assert(!cast<ConstantSDNode>(N.getOperand(1).getOperand(1))->getZExtValue()		assert(!cast<ConstantSDNode>(N.getOperand(1).getOperand(1))->getZExtValue()
&& "Cannot handle constant offsets yet!");		&& "Cannot handle constant offsets yet!");
Disp = N.getOperand(1).getOperand(0); // The global address.		Disp = N.getOperand(1).getOperand(0); // The global address.
assert(Disp.getOpcode() == ISD::TargetGlobalAddress \|\|		assert(Disp.getOpcode() == ISD::TargetGlobalAddress \|\|
Disp.getOpcode() == ISD::TargetGlobalTLSAddress \|\|		Disp.getOpcode() == ISD::TargetGlobalTLSAddress \|\|
Disp.getOpcode() == ISD::TargetConstantPool \|\|		Disp.getOpcode() == ISD::TargetConstantPool \|\|
Disp.getOpcode() == ISD::TargetJumpTable);		Disp.getOpcode() == ISD::TargetJumpTable);
Base = N.getOperand(0);		Base = N.getOperand(0);
return true; // [&g+r]		return true; // [&g+r]
}		}
} else if (N.getOpcode() == ISD::OR) {		} else if (N.getOpcode() == ISD::OR) {
int16_t imm = 0;		int16_t imm = 0;
if (isIntS16Immediate(N.getOperand(1), imm) &&		if (isIntS16Immediate(N.getOperand(1), imm) &&
(!Aligned \|\| (imm & 3) == 0)) {		(!Alignment \|\| (imm % Alignment) == 0)) {
// If this is an or of disjoint bitfields, we can codegen this as an add		// If this is an or of disjoint bitfields, we can codegen this as an add
// (for better address arithmetic) if the LHS and RHS of the OR are		// (for better address arithmetic) if the LHS and RHS of the OR are
// provably disjoint.		// provably disjoint.
KnownBits LHSKnown;		KnownBits LHSKnown;
DAG.computeKnownBits(N.getOperand(0), LHSKnown);		DAG.computeKnownBits(N.getOperand(0), LHSKnown);

if ((LHSKnown.Zero.getZExtValue()\|~(uint64_t)imm) == ~0ULL) {		if ((LHSKnown.Zero.getZExtValue()\|~(uint64_t)imm) == ~0ULL) {
// If all of the bits are known zero on the LHS or RHS, the add won't		// If all of the bits are known zero on the LHS or RHS, the add won't
Show All 10 Lines	if (isIntS16Immediate(N.getOperand(1), imm) &&
}		}
}		}
} else if (ConstantSDNode *CN = dyn_cast<ConstantSDNode>(N)) {		} else if (ConstantSDNode *CN = dyn_cast<ConstantSDNode>(N)) {
// Loading from a constant address.		// Loading from a constant address.

// If this address fits entirely in a 16-bit sext immediate field, codegen		// If this address fits entirely in a 16-bit sext immediate field, codegen
// this as "d, 0"		// this as "d, 0"
int16_t Imm;		int16_t Imm;
if (isIntS16Immediate(CN, Imm) && (!Aligned \|\| (Imm & 3) == 0)) {		if (isIntS16Immediate(CN, Imm) && (!Alignment \|\| (Imm % Alignment) == 0)) {
Disp = DAG.getTargetConstant(Imm, dl, CN->getValueType(0));		Disp = DAG.getTargetConstant(Imm, dl, CN->getValueType(0));
Base = DAG.getRegister(Subtarget.isPPC64() ? PPC::ZERO8 : PPC::ZERO,		Base = DAG.getRegister(Subtarget.isPPC64() ? PPC::ZERO8 : PPC::ZERO,
CN->getValueType(0));		CN->getValueType(0));
return true;		return true;
}		}

// Handle 32-bit sext immediates with LIS + addr mode.		// Handle 32-bit sext immediates with LIS + addr mode.
if ((CN->getValueType(0) == MVT::i32 \|\|		if ((CN->getValueType(0) == MVT::i32 \|\|
(int64_t)CN->getZExtValue() == (int)CN->getZExtValue()) &&		(int64_t)CN->getZExtValue() == (int)CN->getZExtValue()) &&
(!Aligned \|\| (CN->getZExtValue() & 3) == 0)) {		(!Alignment \|\| (CN->getZExtValue() % Alignment) == 0)) {
int Addr = (int)CN->getZExtValue();		int Addr = (int)CN->getZExtValue();

// Otherwise, break this down into an LIS + disp.		// Otherwise, break this down into an LIS + disp.
Disp = DAG.getTargetConstant((short)Addr, dl, MVT::i32);		Disp = DAG.getTargetConstant((short)Addr, dl, MVT::i32);

Base = DAG.getTargetConstant((Addr - (signed short)Addr) >> 16, dl,		Base = DAG.getTargetConstant((Addr - (signed short)Addr) >> 16, dl,
MVT::i32);		MVT::i32);
unsigned Opc = CN->getValueType(0) == MVT::i32 ? PPC::LIS : PPC::LIS8;		unsigned Opc = CN->getValueType(0) == MVT::i32 ? PPC::LIS : PPC::LIS8;
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	if (Swap)
std::swap(Base, Offset);		std::swap(Base, Offset);

AM = ISD::PRE_INC;		AM = ISD::PRE_INC;
return true;		return true;
}		}

// LDU/STU can only handle immediates that are a multiple of 4.		// LDU/STU can only handle immediates that are a multiple of 4.
if (VT != MVT::i64) {		if (VT != MVT::i64) {
if (!SelectAddressRegImm(Ptr, Offset, Base, DAG, false))		if (!SelectAddressRegImm(Ptr, Offset, Base, DAG, 0))
return false;		return false;
} else {		} else {
// LDU/STU need an address with at least 4-byte alignment.		// LDU/STU need an address with at least 4-byte alignment.
if (Alignment < 4)		if (Alignment < 4)
return false;		return false;

if (!SelectAddressRegImm(Ptr, Offset, Base, DAG, true))		if (!SelectAddressRegImm(Ptr, Offset, Base, DAG, 4))
return false;		return false;
}		}

if (LoadSDNode *LD = dyn_cast<LoadSDNode>(N)) {		if (LoadSDNode *LD = dyn_cast<LoadSDNode>(N)) {
// PPC64 doesn't have lwau, but it does have lwaux. Reject preinc load of		// PPC64 doesn't have lwau, but it does have lwaux. Reject preinc load of
// sext i32 to i64 when addr mode is r+i.		// sext i32 to i64 when addr mode is r+i.
if (LD->getValueType(0) == MVT::i64 && LD->getMemoryVT() == MVT::i32 &&		if (LD->getValueType(0) == MVT::i64 && LD->getMemoryVT() == MVT::i32 &&
LD->getExtensionType() == ISD::SEXTLOAD &&		LD->getExtensionType() == ISD::SEXTLOAD &&
▲ Show 20 Lines • Show All 11,130 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/PowerPC/PPCInstrInfo.td

Show First 20 Lines • Show All 399 Lines • ▼ Show 20 Lines
def unaligned4store : PatFrag<(ops node:$val, node:$ptr),		def unaligned4store : PatFrag<(ops node:$val, node:$ptr),
(store node:$val, node:$ptr), [{		(store node:$val, node:$ptr), [{
return cast<StoreSDNode>(N)->getAlignment() < 4;		return cast<StoreSDNode>(N)->getAlignment() < 4;
}]>;		}]>;
def unaligned4sextloadi32 : PatFrag<(ops node:$ptr), (sextloadi32 node:$ptr), [{		def unaligned4sextloadi32 : PatFrag<(ops node:$ptr), (sextloadi32 node:$ptr), [{
return cast<LoadSDNode>(N)->getAlignment() < 4;		return cast<LoadSDNode>(N)->getAlignment() < 4;
}]>;		}]>;

		// This is a somewhat weaker condition than actually checking for 16-byte
		// alignment. It is simply checking that the displacement can be represented
		// as an immediate that is a multiple of 16 (i.e. the requirements for DQ-Form
		// instructions).
		def quadwOffsetLoad : PatFrag<(ops node:$ptr), (load node:$ptr), [{
		return isOffsetMultipleOf(N, 16);
		}]>;
		def quadwOffsetStore : PatFrag<(ops node:$val, node:$ptr),
		(store node:$val, node:$ptr), [{
		return isOffsetMultipleOf(N, 16);
		}]>;
		def nonQuadwOffsetLoad : PatFrag<(ops node:$ptr), (load node:$ptr), [{
		return !isOffsetMultipleOf(N, 16);
		}]>;
		def nonQuadwOffsetStore : PatFrag<(ops node:$val, node:$ptr),
		(store node:$val, node:$ptr), [{
		return !isOffsetMultipleOf(N, 16);
		}]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// PowerPC Flag Definitions.		// PowerPC Flag Definitions.

class isPPC64 { bit PPC64 = 1; }		class isPPC64 { bit PPC64 = 1; }
class isDOT { bit RC = 1; }		class isDOT { bit RC = 1; }

class RegConstraint<string C> {		class RegConstraint<string C> {
string Constraints = C;		string Constraints = C;
▲ Show 20 Lines • Show All 394 Lines • ▼ Show 20 Lines	def pred : Operand<OtherVT> {
let PrintMethod = "printPredicateOperand";		let PrintMethod = "printPredicateOperand";
let MIOperandInfo = (ops i32imm:$bibo, crrc:$reg);		let MIOperandInfo = (ops i32imm:$bibo, crrc:$reg);
}		}

// Define PowerPC specific addressing mode.		// Define PowerPC specific addressing mode.
def iaddr : ComplexPattern<iPTR, 2, "SelectAddrImm", [], []>;		def iaddr : ComplexPattern<iPTR, 2, "SelectAddrImm", [], []>;
def xaddr : ComplexPattern<iPTR, 2, "SelectAddrIdx", [], []>;		def xaddr : ComplexPattern<iPTR, 2, "SelectAddrIdx", [], []>;
def xoaddr : ComplexPattern<iPTR, 2, "SelectAddrIdxOnly",[], []>;		def xoaddr : ComplexPattern<iPTR, 2, "SelectAddrIdxOnly",[], []>;
def ixaddr : ComplexPattern<iPTR, 2, "SelectAddrImmX4", [], []>; // "std"		def ixaddr : ComplexPattern<iPTR, 2, "SelectAddrImmX4", [], []>; // "std"
		def iqaddr : ComplexPattern<iPTR, 2, "SelectAddrImmX16", [], []>; // "stxv"

// The address in a single register. This is used with the SjLj		// The address in a single register. This is used with the SjLj
// pseudo-instructions.		// pseudo-instructions.
def addr : ComplexPattern<iPTR, 1, "SelectAddr",[], []>;		def addr : ComplexPattern<iPTR, 1, "SelectAddr",[], []>;

/// This is just the offset part of iaddr, used for preinc.		/// This is just the offset part of iaddr, used for preinc.
def iaddroff : ComplexPattern<iPTR, 1, "SelectAddrImmOffs", [], []>;		def iaddroff : ComplexPattern<iPTR, 1, "SelectAddrImmOffs", [], []>;

▲ Show 20 Lines • Show All 3,817 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/PowerPC/PPCInstrVSX.td

Show First 20 Lines • Show All 2,600 Lines • ▼ Show 20 Lines	def : Pat<(v4f32 (insertelt v4f32:$A, f32:$B, 1)),
(v4f32 (XXINSERTW v4f32:$A, AlignValues.F32_TO_BE_WORD1, 4))>;		(v4f32 (XXINSERTW v4f32:$A, AlignValues.F32_TO_BE_WORD1, 4))>;
def : Pat<(v4f32 (insertelt v4f32:$A, f32:$B, 2)),		def : Pat<(v4f32 (insertelt v4f32:$A, f32:$B, 2)),
(v4f32 (XXINSERTW v4f32:$A, AlignValues.F32_TO_BE_WORD1, 8))>;		(v4f32 (XXINSERTW v4f32:$A, AlignValues.F32_TO_BE_WORD1, 8))>;
def : Pat<(v4f32 (insertelt v4f32:$A, f32:$B, 3)),		def : Pat<(v4f32 (insertelt v4f32:$A, f32:$B, 3)),
(v4f32 (XXINSERTW v4f32:$A, AlignValues.F32_TO_BE_WORD1, 12))>;		(v4f32 (XXINSERTW v4f32:$A, AlignValues.F32_TO_BE_WORD1, 12))>;
} // IsLittleEndian, HasP9Vector		} // IsLittleEndian, HasP9Vector

// D-Form Load/Store		// D-Form Load/Store
def : Pat<(v4i32 (load iaddr:$src)), (LXV memrix16:$src)>;		def : Pat<(v4i32 (quadwOffsetLoad iqaddr:$src)), (LXV memrix16:$src)>;
def : Pat<(v4f32 (load iaddr:$src)), (LXV memrix16:$src)>;		def : Pat<(v4f32 (quadwOffsetLoad iqaddr:$src)), (LXV memrix16:$src)>;
def : Pat<(v2i64 (load iaddr:$src)), (LXV memrix16:$src)>;		def : Pat<(v2i64 (quadwOffsetLoad iqaddr:$src)), (LXV memrix16:$src)>;
def : Pat<(v2f64 (load iaddr:$src)), (LXV memrix16:$src)>;		def : Pat<(v2f64 (quadwOffsetLoad iqaddr:$src)), (LXV memrix16:$src)>;
def : Pat<(v4i32 (int_ppc_vsx_lxvw4x iaddr:$src)), (LXV memrix16:$src)>;		def : Pat<(v4i32 (int_ppc_vsx_lxvw4x iqaddr:$src)), (LXV memrix16:$src)>;
def : Pat<(v2f64 (int_ppc_vsx_lxvd2x iaddr:$src)), (LXV memrix16:$src)>;		def : Pat<(v2f64 (int_ppc_vsx_lxvd2x iqaddr:$src)), (LXV memrix16:$src)>;

def : Pat<(store v4f32:$rS, iaddr:$dst), (STXV $rS, memrix16:$dst)>;		def : Pat<(quadwOffsetStore v4f32:$rS, iqaddr:$dst), (STXV $rS, memrix16:$dst)>;
def : Pat<(store v4i32:$rS, iaddr:$dst), (STXV $rS, memrix16:$dst)>;		def : Pat<(quadwOffsetStore v4i32:$rS, iqaddr:$dst), (STXV $rS, memrix16:$dst)>;
def : Pat<(store v2f64:$rS, iaddr:$dst), (STXV $rS, memrix16:$dst)>;		def : Pat<(quadwOffsetStore v2f64:$rS, iqaddr:$dst), (STXV $rS, memrix16:$dst)>;
def : Pat<(store v2i64:$rS, iaddr:$dst), (STXV $rS, memrix16:$dst)>;		def : Pat<(quadwOffsetStore v2i64:$rS, iqaddr:$dst), (STXV $rS, memrix16:$dst)>;
def : Pat<(int_ppc_vsx_stxvw4x v4i32:$rS, iaddr:$dst),		def : Pat<(int_ppc_vsx_stxvw4x v4i32:$rS, iqaddr:$dst),
(STXV $rS, memrix16:$dst)>;		(STXV $rS, memrix16:$dst)>;
def : Pat<(int_ppc_vsx_stxvd2x v2f64:$rS, iaddr:$dst),		def : Pat<(int_ppc_vsx_stxvd2x v2f64:$rS, iqaddr:$dst),
(STXV $rS, memrix16:$dst)>;		(STXV $rS, memrix16:$dst)>;


def : Pat<(v2f64 (load xaddr:$src)), (LXVX xaddr:$src)>;		def : Pat<(v2f64 (nonQuadwOffsetLoad xoaddr:$src)), (LXVX xoaddr:$src)>;
def : Pat<(v2i64 (load xaddr:$src)), (LXVX xaddr:$src)>;		def : Pat<(v2i64 (nonQuadwOffsetLoad xoaddr:$src)), (LXVX xoaddr:$src)>;
def : Pat<(v4f32 (load xaddr:$src)), (LXVX xaddr:$src)>;		def : Pat<(v4f32 (nonQuadwOffsetLoad xoaddr:$src)), (LXVX xoaddr:$src)>;
def : Pat<(v4i32 (load xaddr:$src)), (LXVX xaddr:$src)>;		def : Pat<(v4i32 (nonQuadwOffsetLoad xoaddr:$src)), (LXVX xoaddr:$src)>;
def : Pat<(v4i32 (int_ppc_vsx_lxvw4x xaddr:$src)), (LXVX xaddr:$src)>;		def : Pat<(v4i32 (int_ppc_vsx_lxvw4x xoaddr:$src)), (LXVX xoaddr:$src)>;
def : Pat<(v2f64 (int_ppc_vsx_lxvd2x xaddr:$src)), (LXVX xaddr:$src)>;		def : Pat<(v2f64 (int_ppc_vsx_lxvd2x xoaddr:$src)), (LXVX xoaddr:$src)>;
def : Pat<(store v2f64:$rS, xaddr:$dst), (STXVX $rS, xaddr:$dst)>;		def : Pat<(nonQuadwOffsetStore v2f64:$rS, xoaddr:$dst),
def : Pat<(store v2i64:$rS, xaddr:$dst), (STXVX $rS, xaddr:$dst)>;		(STXVX $rS, xoaddr:$dst)>;
def : Pat<(store v4f32:$rS, xaddr:$dst), (STXVX $rS, xaddr:$dst)>;		def : Pat<(nonQuadwOffsetStore v2i64:$rS, xoaddr:$dst),
def : Pat<(store v4i32:$rS, xaddr:$dst), (STXVX $rS, xaddr:$dst)>;		(STXVX $rS, xoaddr:$dst)>;
def : Pat<(int_ppc_vsx_stxvw4x v4i32:$rS, xaddr:$dst),		def : Pat<(nonQuadwOffsetStore v4f32:$rS, xoaddr:$dst),
(STXVX $rS, xaddr:$dst)>;		(STXVX $rS, xoaddr:$dst)>;
def : Pat<(int_ppc_vsx_stxvd2x v2f64:$rS, xaddr:$dst),		def : Pat<(nonQuadwOffsetStore v4i32:$rS, xoaddr:$dst),
(STXVX $rS, xaddr:$dst)>;		(STXVX $rS, xoaddr:$dst)>;
		def : Pat<(int_ppc_vsx_stxvw4x v4i32:$rS, xoaddr:$dst),
		(STXVX $rS, xoaddr:$dst)>;
		def : Pat<(int_ppc_vsx_stxvd2x v2f64:$rS, xoaddr:$dst),
		(STXVX $rS, xoaddr:$dst)>;
def : Pat<(v4i32 (scalar_to_vector (i32 (load xoaddr:$src)))),		def : Pat<(v4i32 (scalar_to_vector (i32 (load xoaddr:$src)))),
(v4i32 (LXVWSX xoaddr:$src))>;		(v4i32 (LXVWSX xoaddr:$src))>;
def : Pat<(v4f32 (scalar_to_vector (f32 (load xoaddr:$src)))),		def : Pat<(v4f32 (scalar_to_vector (f32 (load xoaddr:$src)))),
(v4f32 (LXVWSX xoaddr:$src))>;		(v4f32 (LXVWSX xoaddr:$src))>;
def : Pat<(v4f32 (scalar_to_vector (f32 (fpround (extloadf32 xoaddr:$src))))),		def : Pat<(v4f32 (scalar_to_vector (f32 (fpround (extloadf32 xoaddr:$src))))),
(v4f32 (LXVWSX xoaddr:$src))>;		(v4f32 (LXVWSX xoaddr:$src))>;

// Build vectors from i8 loads		// Build vectors from i8 loads
▲ Show 20 Lines • Show All 135 Lines • ▼ Show 20 Lines	let AddedComplexity = 400, Predicates = [HasP9Vector] in {
def : Pat<(f64 (PPCVexts f64:$A, 1)),		def : Pat<(f64 (PPCVexts f64:$A, 1)),
(f64 (COPY_TO_REGCLASS (VEXTSB2Ds $A), VSFRC))>;		(f64 (COPY_TO_REGCLASS (VEXTSB2Ds $A), VSFRC))>;
def : Pat<(f64 (PPCVexts f64:$A, 2)),		def : Pat<(f64 (PPCVexts f64:$A, 2)),
(f64 (COPY_TO_REGCLASS (VEXTSH2Ds $A), VSFRC))>;		(f64 (COPY_TO_REGCLASS (VEXTSH2Ds $A), VSFRC))>;

let isPseudo = 1 in {		let isPseudo = 1 in {
def DFLOADf32 : Pseudo<(outs vssrc:$XT), (ins memrix:$src),		def DFLOADf32 : Pseudo<(outs vssrc:$XT), (ins memrix:$src),
"#DFLOADf32",		"#DFLOADf32",
[(set f32:$XT, (load iaddr:$src))]>;		[(set f32:$XT, (load ixaddr:$src))]>;
def DFLOADf64 : Pseudo<(outs vsfrc:$XT), (ins memrix:$src),		def DFLOADf64 : Pseudo<(outs vsfrc:$XT), (ins memrix:$src),
"#DFLOADf64",		"#DFLOADf64",
[(set f64:$XT, (load iaddr:$src))]>;		[(set f64:$XT, (load ixaddr:$src))]>;
def DFSTOREf32 : Pseudo<(outs), (ins vssrc:$XT, memrix:$dst),		def DFSTOREf32 : Pseudo<(outs), (ins vssrc:$XT, memrix:$dst),
"#DFSTOREf32",		"#DFSTOREf32",
[(store f32:$XT, iaddr:$dst)]>;		[(store f32:$XT, ixaddr:$dst)]>;
def DFSTOREf64 : Pseudo<(outs), (ins vsfrc:$XT, memrix:$dst),		def DFSTOREf64 : Pseudo<(outs), (ins vsfrc:$XT, memrix:$dst),
"#DFSTOREf64",		"#DFSTOREf64",
[(store f64:$XT, iaddr:$dst)]>;		[(store f64:$XT, ixaddr:$dst)]>;
}		}
def : Pat<(f64 (extloadf32 iaddr:$src)),		def : Pat<(f64 (extloadf32 ixaddr:$src)),
(COPY_TO_REGCLASS (DFLOADf32 iaddr:$src), VSFRC)>;		(COPY_TO_REGCLASS (DFLOADf32 ixaddr:$src), VSFRC)>;
def : Pat<(f32 (fpround (extloadf32 iaddr:$src))),		def : Pat<(f32 (fpround (extloadf32 ixaddr:$src))),
(f32 (DFLOADf32 iaddr:$src))>;		(f32 (DFLOADf32 ixaddr:$src))>;
} // end HasP9Vector, AddedComplexity		} // end HasP9Vector, AddedComplexity

// Integer extend helper dags 32 -> 64		// Integer extend helper dags 32 -> 64
def AnyExts {		def AnyExts {
dag A = (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $A, sub_32);		dag A = (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $A, sub_32);
dag B = (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $B, sub_32);		dag B = (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $B, sub_32);
dag C = (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $C, sub_32);		dag C = (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $C, sub_32);
dag D = (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $D, sub_32);		dag D = (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $D, sub_32);
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
}		}
def FltToUIntLoad {		def FltToUIntLoad {
dag A = (i32 (PPCmfvsr (PPCfctiwuz (f64 (extloadf32 xoaddr:$A)))));		dag A = (i32 (PPCmfvsr (PPCfctiwuz (f64 (extloadf32 xoaddr:$A)))));
}		}
def FltToLongLoad {		def FltToLongLoad {
dag A = (i64 (PPCmfvsr (PPCfctidz (f64 (extloadf32 xoaddr:$A)))));		dag A = (i64 (PPCmfvsr (PPCfctidz (f64 (extloadf32 xoaddr:$A)))));
}		}
def FltToLongLoadP9 {		def FltToLongLoadP9 {
dag A = (i64 (PPCmfvsr (PPCfctidz (f64 (extloadf32 iaddr:$A)))));		dag A = (i64 (PPCmfvsr (PPCfctidz (f64 (extloadf32 ixaddr:$A)))));
}		}
def FltToULongLoad {		def FltToULongLoad {
dag A = (i64 (PPCmfvsr (PPCfctiduz (f64 (extloadf32 xoaddr:$A)))));		dag A = (i64 (PPCmfvsr (PPCfctiduz (f64 (extloadf32 xoaddr:$A)))));
}		}
def FltToULongLoadP9 {		def FltToULongLoadP9 {
dag A = (i64 (PPCmfvsr (PPCfctiduz (f64 (extloadf32 iaddr:$A)))));		dag A = (i64 (PPCmfvsr (PPCfctiduz (f64 (extloadf32 ixaddr:$A)))));
}		}
def FltToLong {		def FltToLong {
dag A = (i64 (PPCmfvsr (PPCfctidz (fpextend f32:$A))));		dag A = (i64 (PPCmfvsr (PPCfctidz (fpextend f32:$A))));
}		}
def FltToULong {		def FltToULong {
dag A = (i64 (PPCmfvsr (PPCfctiduz (fpextend f32:$A))));		dag A = (i64 (PPCmfvsr (PPCfctiduz (fpextend f32:$A))));
}		}
def DblToInt {		def DblToInt {
dag A = (i32 (PPCmfvsr (f64 (PPCfctiwz f64:$A))));		dag A = (i32 (PPCmfvsr (f64 (PPCfctiwz f64:$A))));
}		}
def DblToUInt {		def DblToUInt {
dag A = (i32 (PPCmfvsr (f64 (PPCfctiwuz f64:$A))));		dag A = (i32 (PPCmfvsr (f64 (PPCfctiwuz f64:$A))));
}		}
def DblToLong {		def DblToLong {
dag A = (i64 (PPCmfvsr (f64 (PPCfctidz f64:$A))));		dag A = (i64 (PPCmfvsr (f64 (PPCfctidz f64:$A))));
}		}
def DblToULong {		def DblToULong {
dag A = (i64 (PPCmfvsr (f64 (PPCfctiduz f64:$A))));		dag A = (i64 (PPCmfvsr (f64 (PPCfctiduz f64:$A))));
}		}
def DblToIntLoad {		def DblToIntLoad {
dag A = (i32 (PPCmfvsr (PPCfctiwz (f64 (load xoaddr:$A)))));		dag A = (i32 (PPCmfvsr (PPCfctiwz (f64 (load xoaddr:$A)))));
}		}
def DblToIntLoadP9 {		def DblToIntLoadP9 {
dag A = (i32 (PPCmfvsr (PPCfctiwz (f64 (load iaddr:$A)))));		dag A = (i32 (PPCmfvsr (PPCfctiwz (f64 (load ixaddr:$A)))));
}		}
def DblToUIntLoad {		def DblToUIntLoad {
dag A = (i32 (PPCmfvsr (PPCfctiwuz (f64 (load xoaddr:$A)))));		dag A = (i32 (PPCmfvsr (PPCfctiwuz (f64 (load xoaddr:$A)))));
}		}
def DblToUIntLoadP9 {		def DblToUIntLoadP9 {
dag A = (i32 (PPCmfvsr (PPCfctiwuz (f64 (load iaddr:$A)))));		dag A = (i32 (PPCmfvsr (PPCfctiwuz (f64 (load ixaddr:$A)))));
}		}
def DblToLongLoad {		def DblToLongLoad {
dag A = (i64 (PPCmfvsr (PPCfctidz (f64 (load xoaddr:$A)))));		dag A = (i64 (PPCmfvsr (PPCfctidz (f64 (load xoaddr:$A)))));
}		}
def DblToULongLoad {		def DblToULongLoad {
dag A = (i64 (PPCmfvsr (PPCfctiduz (f64 (load xoaddr:$A)))));		dag A = (i64 (PPCmfvsr (PPCfctiduz (f64 (load xoaddr:$A)))));
}		}

▲ Show 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	let Predicates = [HasP9Vector] in {
def : Pat<(v2i64 immAllOnesV),		def : Pat<(v2i64 immAllOnesV),
(v2i64 (XXSPLTIB 255))>;		(v2i64 (XXSPLTIB 255))>;
def : Pat<(v4i32 (scalar_to_vector FltToIntLoad.A)),		def : Pat<(v4i32 (scalar_to_vector FltToIntLoad.A)),
(v4i32 (XVCVSPSXWS (LXVWSX xoaddr:$A)))>;		(v4i32 (XVCVSPSXWS (LXVWSX xoaddr:$A)))>;
def : Pat<(v4i32 (scalar_to_vector FltToUIntLoad.A)),		def : Pat<(v4i32 (scalar_to_vector FltToUIntLoad.A)),
(v4i32 (XVCVSPUXWS (LXVWSX xoaddr:$A)))>;		(v4i32 (XVCVSPUXWS (LXVWSX xoaddr:$A)))>;
def : Pat<(v4i32 (scalar_to_vector DblToIntLoadP9.A)),		def : Pat<(v4i32 (scalar_to_vector DblToIntLoadP9.A)),
(v4i32 (XXSPLTW (COPY_TO_REGCLASS		(v4i32 (XXSPLTW (COPY_TO_REGCLASS
(XSCVDPSXWS (DFLOADf64 iaddr:$A)), VSRC), 1))>;		(XSCVDPSXWS (DFLOADf64 ixaddr:$A)), VSRC), 1))>;
def : Pat<(v4i32 (scalar_to_vector DblToUIntLoadP9.A)),		def : Pat<(v4i32 (scalar_to_vector DblToUIntLoadP9.A)),
(v4i32 (XXSPLTW (COPY_TO_REGCLASS		(v4i32 (XXSPLTW (COPY_TO_REGCLASS
(XSCVDPUXWS (DFLOADf64 iaddr:$A)), VSRC), 1))>;		(XSCVDPUXWS (DFLOADf64 ixaddr:$A)), VSRC), 1))>;
def : Pat<(v2i64 (scalar_to_vector FltToLongLoadP9.A)),		def : Pat<(v2i64 (scalar_to_vector FltToLongLoadP9.A)),
(v2i64 (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS		(v2i64 (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS
(DFLOADf32 iaddr:$A),		(DFLOADf32 ixaddr:$A),
VSFRC)), 0))>;		VSFRC)), 0))>;
def : Pat<(v2i64 (scalar_to_vector FltToULongLoadP9.A)),		def : Pat<(v2i64 (scalar_to_vector FltToULongLoadP9.A)),
(v2i64 (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS		(v2i64 (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS
(DFLOADf32 iaddr:$A),		(DFLOADf32 ixaddr:$A),
VSFRC)), 0))>;		VSFRC)), 0))>;
}		}

let Predicates = [IsISA3_0, HasDirectMove, IsBigEndian] in {		let Predicates = [IsISA3_0, HasDirectMove, IsBigEndian] in {
def : Pat<(i64 (extractelt v2i64:$A, 1)),		def : Pat<(i64 (extractelt v2i64:$A, 1)),
(i64 (MFVSRLD $A))>;		(i64 (MFVSRLD $A))>;
// Better way to build integer vectors if we have MTVSRDD. Big endian.		// Better way to build integer vectors if we have MTVSRDD. Big endian.
def : Pat<(v2i64 (build_vector i64:$rB, i64:$rA)),		def : Pat<(v2i64 (build_vector i64:$rB, i64:$rA)),
▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/PowerPC/PPCRegisterInfo.cpp

Show First 20 Lines • Show All 748 Lines • ▼ Show 20 Lines	else {
const PPCFunctionInfo *FI = MF.getInfo<PPCFunctionInfo>();		const PPCFunctionInfo *FI = MF.getInfo<PPCFunctionInfo>();
FrameIdx = FI->getCRSpillFrameIndex();		FrameIdx = FI->getCRSpillFrameIndex();
}		}
return true;		return true;
}		}
return false;		return false;
}		}

// Figure out if the offset in the instruction must be a multiple of 4.		// If the offset must be a multiple of some value, return what that value is.
// This is true for instructions like "STD".		static unsigned offsetMinAlign(const MachineInstr &MI) {
static bool usesIXAddr(const MachineInstr &MI) {
unsigned OpC = MI.getOpcode();		unsigned OpC = MI.getOpcode();

switch (OpC) {		switch (OpC) {
default:		default:
return false;		return 1;
case PPC::LWA:		case PPC::LWA:
case PPC::LWA_32:		case PPC::LWA_32:
case PPC::LD:		case PPC::LD:
		case PPC::LDU:
case PPC::STD:		case PPC::STD:
return true;		case PPC::STDU:
		case PPC::DFLOADf32:
		case PPC::DFLOADf64:
		case PPC::DFSTOREf32:
		case PPC::DFSTOREf64:
		case PPC::LXSD:
		case PPC::LXSSP:
		case PPC::STXSD:
		case PPC::STXSSP:
		return 4;
		case PPC::LXV:
		case PPC::STXV:
		return 16;
}		}
}		}

// Return the OffsetOperandNo given the FIOperandNum (and the instruction).		// Return the OffsetOperandNo given the FIOperandNum (and the instruction).
static unsigned getOffsetONFromFION(const MachineInstr &MI,		static unsigned getOffsetONFromFION(const MachineInstr &MI,
unsigned FIOperandNum) {		unsigned FIOperandNum) {
// Take into account whether it's an add or mem instruction		// Take into account whether it's an add or mem instruction
unsigned OffsetOperandNo = (FIOperandNum == 2) ? 1 : 2;		unsigned OffsetOperandNo = (FIOperandNum == 2) ? 1 : 2;
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	if (OpC == PPC::SPILL_CR) {
lowerVRSAVERestore(II, FrameIndex);		lowerVRSAVERestore(II, FrameIndex);
return;		return;
}		}

// Replace the FrameIndex with base register with GPR1 (SP) or GPR31 (FP).		// Replace the FrameIndex with base register with GPR1 (SP) or GPR31 (FP).
MI.getOperand(FIOperandNum).ChangeToRegister(		MI.getOperand(FIOperandNum).ChangeToRegister(
FrameIndex < 0 ? getBaseRegister(MF) : getFrameRegister(MF), false);		FrameIndex < 0 ? getBaseRegister(MF) : getFrameRegister(MF), false);

// Figure out if the offset in the instruction is shifted right two bits.
bool isIXAddr = usesIXAddr(MI);

// If the instruction is not present in ImmToIdxMap, then it has no immediate		// If the instruction is not present in ImmToIdxMap, then it has no immediate
// form (and must be r+r).		// form (and must be r+r).
bool noImmForm = !MI.isInlineAsm() && OpC != TargetOpcode::STACKMAP &&		bool noImmForm = !MI.isInlineAsm() && OpC != TargetOpcode::STACKMAP &&
OpC != TargetOpcode::PATCHPOINT && !ImmToIdxMap.count(OpC);		OpC != TargetOpcode::PATCHPOINT && !ImmToIdxMap.count(OpC);

// Now add the frame object offset to the offset from r1.		// Now add the frame object offset to the offset from r1.
int Offset = MFI.getObjectOffset(FrameIndex);		int Offset = MFI.getObjectOffset(FrameIndex);
Offset += MI.getOperand(OffsetOperandNo).getImm();		Offset += MI.getOperand(OffsetOperandNo).getImm();
Show All 12 Lines	PPCRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
// If we can, encode the offset directly into the instruction. If this is a		// If we can, encode the offset directly into the instruction. If this is a
// normal PPC "ri" instruction, any 16-bit value can be safely encoded. If		// normal PPC "ri" instruction, any 16-bit value can be safely encoded. If
// this is a PPC64 "ix" instruction, only a 16-bit value with the low two bits		// this is a PPC64 "ix" instruction, only a 16-bit value with the low two bits
// clear can be encoded. This is extremely uncommon, because normally you		// clear can be encoded. This is extremely uncommon, because normally you
// only "std" to a stack slot that is at least 4-byte aligned, but it can		// only "std" to a stack slot that is at least 4-byte aligned, but it can
// happen in invalid code.		// happen in invalid code.
assert(OpC != PPC::DBG_VALUE &&		assert(OpC != PPC::DBG_VALUE &&
"This should be handled in a target-independent way");		"This should be handled in a target-independent way");
if (!noImmForm && ((isInt<16>(Offset) && (!isIXAddr \|\| (Offset & 3) == 0)) \|\|		if (!noImmForm && ((isInt<16>(Offset) &&
		((Offset % offsetMinAlign(MI)) == 0)) \|\|
OpC == TargetOpcode::STACKMAP \|\|		OpC == TargetOpcode::STACKMAP \|\|
OpC == TargetOpcode::PATCHPOINT)) {		OpC == TargetOpcode::PATCHPOINT)) {
MI.getOperand(OffsetOperandNo).ChangeToImmediate(Offset);		MI.getOperand(OffsetOperandNo).ChangeToImmediate(Offset);
return;		return;
}		}

// The offset doesn't fit into a single register, scavenge one to build the		// The offset doesn't fit into a single register, scavenge one to build the
// offset in.		// offset in.
▲ Show 20 Lines • Show All 176 Lines • ▼ Show 20 Lines	bool PPCRegisterInfo::isFrameOffsetLegal(const MachineInstr *MI,
}		}

unsigned OffsetOperandNo = getOffsetONFromFION(*MI, FIOperandNum);		unsigned OffsetOperandNo = getOffsetONFromFION(*MI, FIOperandNum);
Offset += MI->getOperand(OffsetOperandNo).getImm();		Offset += MI->getOperand(OffsetOperandNo).getImm();

return MI->getOpcode() == PPC::DBG_VALUE \|\| // DBG_VALUE is always Reg+Imm		return MI->getOpcode() == PPC::DBG_VALUE \|\| // DBG_VALUE is always Reg+Imm
MI->getOpcode() == TargetOpcode::STACKMAP \|\|		MI->getOpcode() == TargetOpcode::STACKMAP \|\|
MI->getOpcode() == TargetOpcode::PATCHPOINT \|\|		MI->getOpcode() == TargetOpcode::PATCHPOINT \|\|
(isInt<16>(Offset) && (!usesIXAddr(*MI) \|\| (Offset & 3) == 0));		(isInt<16>(Offset) && (Offset % offsetMinAlign(*MI)) == 0);
}		}

llvm/trunk/test/CodeGen/PowerPC/PR33671.ll

				; Function Attrs: norecurse nounwind
				; RUN: llc -mtriple=powerpc64le-unknown-unknown -mcpu=pwr9 < %s \| FileCheck %s
				define void @test1(i32* nocapture readonly %arr, i32* nocapture %arrTo) {
				entry:
				%arrayidx = getelementptr inbounds i32, i32* %arrTo, i64 4
				%0 = bitcast i32* %arrayidx to <4 x i32>*
				%arrayidx1 = getelementptr inbounds i32, i32* %arr, i64 4
				%1 = bitcast i32* %arrayidx1 to <4 x i32>*
				%2 = load <4 x i32>, <4 x i32>* %1, align 16
				store <4 x i32> %2, <4 x i32>* %0, align 16
				ret void
				; CHECK-LABEL: test1
				; CHECK: lxv [[LD:[0-9]+]], 16(3)
				; CHECK: stxv [[LD]], 16(4)
				}

				; Function Attrs: norecurse nounwind
				define void @test2(i32* nocapture readonly %arr, i32* nocapture %arrTo) {
				entry:
				%arrayidx = getelementptr inbounds i32, i32* %arrTo, i64 1
				%0 = bitcast i32* %arrayidx to <4 x i32>*
				%arrayidx1 = getelementptr inbounds i32, i32* %arr, i64 2
				%1 = bitcast i32* %arrayidx1 to <4 x i32>*
				%2 = load <4 x i32>, <4 x i32>* %1, align 16
				store <4 x i32> %2, <4 x i32>* %0, align 16
				ret void
				; CHECK-LABEL: test2
				; CHECK: addi 3, 3, 8
				; CHECK: lxvx [[LD:[0-9]+]], 0, 3
				; CHECK: addi 3, 4, 4
				; CHECK: stxvx [[LD]], 0, 3
				}

llvm/trunk/test/CodeGen/PowerPC/build-vector-tests.ll

Show First 20 Lines • Show All 1,012 Lines • ▼ Show 20 Lines	entry:
%3 = load i32, i32* %arrayidx10, align 4		%3 = load i32, i32* %arrayidx10, align 4
%vecinit11 = insertelement <4 x i32> %vecinit7, i32 %3, i32 3		%vecinit11 = insertelement <4 x i32> %vecinit7, i32 %3, i32 3
ret <4 x i32> %vecinit11		ret <4 x i32> %vecinit11
; P9BE-LABEL: fromDiffMemVarDi		; P9BE-LABEL: fromDiffMemVarDi
; P9LE-LABEL: fromDiffMemVarDi		; P9LE-LABEL: fromDiffMemVarDi
; P8BE-LABEL: fromDiffMemVarDi		; P8BE-LABEL: fromDiffMemVarDi
; P8LE-LABEL: fromDiffMemVarDi		; P8LE-LABEL: fromDiffMemVarDi
; P9BE: sldi {{r[0-9]+}}, r4, 2		; P9BE: sldi {{r[0-9]+}}, r4, 2
; P9BE-DAG: lxv {{v[0-9]+}}		; P9BE-DAG: lxvx {{v[0-9]+}}
; P9BE-DAG: lxv		; P9BE-DAG: lxvx
; P9BE: vperm		; P9BE: vperm
; P9BE: blr		; P9BE: blr
; P9LE: sldi {{r[0-9]+}}, r4, 2		; P9LE: sldi {{r[0-9]+}}, r4, 2
; P9LE-DAG: lxv {{v[0-9]+}}		; P9LE-DAG: lxvx {{v[0-9]+}}
; P9LE-DAG: lxv		; P9LE-DAG: lxvx
; P9LE: vperm		; P9LE: vperm
; P9LE: blr		; P9LE: blr
; P8BE: sldi {{r[0-9]+}}, r4, 2		; P8BE: sldi {{r[0-9]+}}, r4, 2
; P8BE-DAG: lxvw4x {{v[0-9]+}}, 0, r3		; P8BE-DAG: lxvw4x {{v[0-9]+}}, 0, r3
; P8BE-DAG: lxvw4x		; P8BE-DAG: lxvw4x
; P8BE: vperm		; P8BE: vperm
; P8BE: blr		; P8BE: blr
; P8LE: sldi {{r[0-9]+}}, r4, 2		; P8LE: sldi {{r[0-9]+}}, r4, 2
▲ Show 20 Lines • Show All 543 Lines • ▼ Show 20 Lines	entry:
%4 = load <2 x double>, <2 x double>* %3, align 8		%4 = load <2 x double>, <2 x double>* %3, align 8
%5 = fptosi <2 x double> %4 to <2 x i32>		%5 = fptosi <2 x double> %4 to <2 x i32>
%vecinit9 = shufflevector <2 x i32> %2, <2 x i32> %5, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		%vecinit9 = shufflevector <2 x i32> %2, <2 x i32> %5, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
ret <4 x i32> %vecinit9		ret <4 x i32> %vecinit9
; P9BE-LABEL: fromDiffMemConsAConvdtoi		; P9BE-LABEL: fromDiffMemConsAConvdtoi
; P9LE-LABEL: fromDiffMemConsAConvdtoi		; P9LE-LABEL: fromDiffMemConsAConvdtoi
; P8BE-LABEL: fromDiffMemConsAConvdtoi		; P8BE-LABEL: fromDiffMemConsAConvdtoi
; P8LE-LABEL: fromDiffMemConsAConvdtoi		; P8LE-LABEL: fromDiffMemConsAConvdtoi
; P9BE: lxv [[REG1:[vs0-9]+]], 0(r3)		; P9BE-DAG: lxv [[REG1:[vs0-9]+]], 0(r3)
; P9BE: lxv [[REG2:[vs0-9]+]], 16(r3)		; P9BE-DAG: lxv [[REG2:[vs0-9]+]], 16(r3)
; P9BE-DAG: xxmrgld [[REG3:[vs0-9]+]], [[REG1]], [[REG2]]		; P9BE-DAG: xxmrgld [[REG3:[vs0-9]+]], [[REG1]], [[REG2]]
; P9BE-DAG: xxmrghd [[REG4:[vs0-9]+]], [[REG1]], [[REG2]]		; P9BE-DAG: xxmrghd [[REG4:[vs0-9]+]], [[REG1]], [[REG2]]
; P9BE-DAG: xvcvdpsp [[REG5:[vs0-9]+]], [[REG3]]		; P9BE-DAG: xvcvdpsp [[REG5:[vs0-9]+]], [[REG3]]
; P9BE-DAG: xvcvdpsp [[REG6:[vs0-9]+]], [[REG4]]		; P9BE-DAG: xvcvdpsp [[REG6:[vs0-9]+]], [[REG4]]
; P9BE: vmrgew v2, [[REG6]], [[REG5]]		; P9BE: vmrgew v2, [[REG6]], [[REG5]]
; P9BE: xvcvspsxws v2, v2		; P9BE: xvcvspsxws v2, v2
; P9LE: lxv [[REG1:[vs0-9]+]], 0(r3)		; P9LE-DAG: lxv [[REG1:[vs0-9]+]], 0(r3)
; P9LE: lxv [[REG2:[vs0-9]+]], 16(r3)		; P9LE-DAG: lxv [[REG2:[vs0-9]+]], 16(r3)
; P9LE-DAG: xxmrgld [[REG3:[vs0-9]+]], [[REG2]], [[REG1]]		; P9LE-DAG: xxmrgld [[REG3:[vs0-9]+]], [[REG2]], [[REG1]]
; P9LE-DAG: xxmrghd [[REG4:[vs0-9]+]], [[REG2]], [[REG1]]		; P9LE-DAG: xxmrghd [[REG4:[vs0-9]+]], [[REG2]], [[REG1]]
; P9LE-DAG: xvcvdpsp [[REG5:[vs0-9]+]], [[REG3]]		; P9LE-DAG: xvcvdpsp [[REG5:[vs0-9]+]], [[REG3]]
; P9LE-DAG: xvcvdpsp [[REG6:[vs0-9]+]], [[REG4]]		; P9LE-DAG: xvcvdpsp [[REG6:[vs0-9]+]], [[REG4]]
; P9LE: vmrgew v2, [[REG6]], [[REG5]]		; P9LE: vmrgew v2, [[REG6]], [[REG5]]
; P9LE: xvcvspsxws v2, v2		; P9LE: xvcvspsxws v2, v2
; P8BE: lxvd2x [[REG1:[vs0-9]+]], 0, r3		; P8BE: lxvd2x [[REG1:[vs0-9]+]], 0, r3
; P8BE: lxvd2x [[REG2:[vs0-9]+]], r3, r4		; P8BE: lxvd2x [[REG2:[vs0-9]+]], r3, r4
▲ Show 20 Lines • Show All 567 Lines • ▼ Show 20 Lines	entry:
%3 = load i32, i32* %arrayidx10, align 4		%3 = load i32, i32* %arrayidx10, align 4
%vecinit11 = insertelement <4 x i32> %vecinit7, i32 %3, i32 3		%vecinit11 = insertelement <4 x i32> %vecinit7, i32 %3, i32 3
ret <4 x i32> %vecinit11		ret <4 x i32> %vecinit11
; P9BE-LABEL: fromDiffMemVarDui		; P9BE-LABEL: fromDiffMemVarDui
; P9LE-LABEL: fromDiffMemVarDui		; P9LE-LABEL: fromDiffMemVarDui
; P8BE-LABEL: fromDiffMemVarDui		; P8BE-LABEL: fromDiffMemVarDui
; P8LE-LABEL: fromDiffMemVarDui		; P8LE-LABEL: fromDiffMemVarDui
; P9BE-DAG: sldi {{r[0-9]+}}, r4, 2		; P9BE-DAG: sldi {{r[0-9]+}}, r4, 2
; P9BE-DAG: lxv {{v[0-9]+}}, -12(r3)		; P9BE-DAG: addi r3, r3, -12
; P9BE-DAG: lxv		; P9BE-DAG: lxvx {{v[0-9]+}}, 0, r3
		; P9BE-DAG: lxvx
; P9BE: vperm		; P9BE: vperm
; P9BE: blr		; P9BE: blr
; P9LE-DAG: sldi {{r[0-9]+}}, r4, 2		; P9LE-DAG: sldi {{r[0-9]+}}, r4, 2
; P9LE-DAG: lxv {{v[0-9]+}}, -12(r3)		; P9LE-DAG: addi r3, r3, -12
		; P9LE-DAG: lxvx {{v[0-9]+}}, 0, r3
; P9LE-DAG: lxv		; P9LE-DAG: lxv
; P9LE: vperm		; P9LE: vperm
; P9LE: blr		; P9LE: blr
; P8BE-DAG: sldi {{r[0-9]+}}, r4, 2		; P8BE-DAG: sldi {{r[0-9]+}}, r4, 2
; P8BE-DAG: lxvw4x {{v[0-9]+}}, 0, r3		; P8BE-DAG: lxvw4x {{v[0-9]+}}, 0, r3
; P8BE-DAG: lxvw4x		; P8BE-DAG: lxvw4x
; P8BE: vperm		; P8BE: vperm
; P8BE: blr		; P8BE: blr
▲ Show 20 Lines • Show All 543 Lines • ▼ Show 20 Lines	entry:
%4 = load <2 x double>, <2 x double>* %3, align 8		%4 = load <2 x double>, <2 x double>* %3, align 8
%5 = fptoui <2 x double> %4 to <2 x i32>		%5 = fptoui <2 x double> %4 to <2 x i32>
%vecinit9 = shufflevector <2 x i32> %2, <2 x i32> %5, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		%vecinit9 = shufflevector <2 x i32> %2, <2 x i32> %5, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
ret <4 x i32> %vecinit9		ret <4 x i32> %vecinit9
; P9BE-LABEL: fromDiffMemConsAConvdtoui		; P9BE-LABEL: fromDiffMemConsAConvdtoui
; P9LE-LABEL: fromDiffMemConsAConvdtoui		; P9LE-LABEL: fromDiffMemConsAConvdtoui
; P8BE-LABEL: fromDiffMemConsAConvdtoui		; P8BE-LABEL: fromDiffMemConsAConvdtoui
; P8LE-LABEL: fromDiffMemConsAConvdtoui		; P8LE-LABEL: fromDiffMemConsAConvdtoui
; P9BE: lxv [[REG1:[vs0-9]+]], 0(r3)		; P9BE-DAG: lxv [[REG1:[vs0-9]+]], 0(r3)
; P9BE: lxv [[REG2:[vs0-9]+]], 16(r3)		; P9BE-DAG: lxv [[REG2:[vs0-9]+]], 16(r3)
; P9BE-DAG: xxmrgld [[REG3:[vs0-9]+]], [[REG1]], [[REG2]]		; P9BE-DAG: xxmrgld [[REG3:[vs0-9]+]], [[REG1]], [[REG2]]
; P9BE-DAG: xxmrghd [[REG4:[vs0-9]+]], [[REG1]], [[REG2]]		; P9BE-DAG: xxmrghd [[REG4:[vs0-9]+]], [[REG1]], [[REG2]]
; P9BE-DAG: xvcvdpsp [[REG5:[vs0-9]+]], [[REG3]]		; P9BE-DAG: xvcvdpsp [[REG5:[vs0-9]+]], [[REG3]]
; P9BE-DAG: xvcvdpsp [[REG6:[vs0-9]+]], [[REG4]]		; P9BE-DAG: xvcvdpsp [[REG6:[vs0-9]+]], [[REG4]]
; P9BE: vmrgew v2, [[REG6]], [[REG5]]		; P9BE: vmrgew v2, [[REG6]], [[REG5]]
; P9BE: xvcvspuxws v2, v2		; P9BE: xvcvspuxws v2, v2
; P9LE: lxv [[REG1:[vs0-9]+]], 0(r3)		; P9LE-DAG: lxv [[REG1:[vs0-9]+]], 0(r3)
; P9LE: lxv [[REG2:[vs0-9]+]], 16(r3)		; P9LE-DAG: lxv [[REG2:[vs0-9]+]], 16(r3)
; P9LE-DAG: xxmrgld [[REG3:[vs0-9]+]], [[REG2]], [[REG1]]		; P9LE-DAG: xxmrgld [[REG3:[vs0-9]+]], [[REG2]], [[REG1]]
; P9LE-DAG: xxmrghd [[REG4:[vs0-9]+]], [[REG2]], [[REG1]]		; P9LE-DAG: xxmrghd [[REG4:[vs0-9]+]], [[REG2]], [[REG1]]
; P9LE-DAG: xvcvdpsp [[REG5:[vs0-9]+]], [[REG3]]		; P9LE-DAG: xvcvdpsp [[REG5:[vs0-9]+]], [[REG3]]
; P9LE-DAG: xvcvdpsp [[REG6:[vs0-9]+]], [[REG4]]		; P9LE-DAG: xvcvdpsp [[REG6:[vs0-9]+]], [[REG4]]
; P9LE: vmrgew v2, [[REG6]], [[REG5]]		; P9LE: vmrgew v2, [[REG6]], [[REG5]]
; P9LE: xvcvspuxws v2, v2		; P9LE: xvcvspuxws v2, v2
; P8BE: lxvd2x [[REG1:[vs0-9]+]], 0, r3		; P8BE: lxvd2x [[REG1:[vs0-9]+]], 0, r3
; P8BE: lxvd2x [[REG2:[vs0-9]+]], r3, r4		; P8BE: lxvd2x [[REG2:[vs0-9]+]], r3, r4
▲ Show 20 Lines • Show All 698 Lines • ▼ Show 20 Lines
; Function Attrs: norecurse nounwind readnone		; Function Attrs: norecurse nounwind readnone
define <2 x i64> @fromDiffConstsConvftoll() {		define <2 x i64> @fromDiffConstsConvftoll() {
entry:		entry:
ret <2 x i64> <i64 24, i64 234>		ret <2 x i64> <i64 24, i64 234>
; P9BE-LABEL: fromDiffConstsConvftoll		; P9BE-LABEL: fromDiffConstsConvftoll
; P9LE-LABEL: fromDiffConstsConvftoll		; P9LE-LABEL: fromDiffConstsConvftoll
; P8BE-LABEL: fromDiffConstsConvftoll		; P8BE-LABEL: fromDiffConstsConvftoll
; P8LE-LABEL: fromDiffConstsConvftoll		; P8LE-LABEL: fromDiffConstsConvftoll
; P9BE: lxv v2		; P9BE: lxvx v2
; P9BE: blr		; P9BE: blr
; P9LE: lxv v2		; P9LE: lxvx v2
; P9LE: blr		; P9LE: blr
; P8BE: lxvd2x v2		; P8BE: lxvd2x v2
; P8BE: blr		; P8BE: blr
; P8LE: lxvd2x		; P8LE: lxvd2x
; P8LE: xxswapd v2		; P8LE: xxswapd v2
; P8LE: blr		; P8LE: blr
}		}

▲ Show 20 Lines • Show All 885 Lines • ▼ Show 20 Lines
; Function Attrs: norecurse nounwind readnone		; Function Attrs: norecurse nounwind readnone
define <2 x i64> @fromDiffConstsConvftoull() {		define <2 x i64> @fromDiffConstsConvftoull() {
entry:		entry:
ret <2 x i64> <i64 24, i64 234>		ret <2 x i64> <i64 24, i64 234>
; P9BE-LABEL: fromDiffConstsConvftoull		; P9BE-LABEL: fromDiffConstsConvftoull
; P9LE-LABEL: fromDiffConstsConvftoull		; P9LE-LABEL: fromDiffConstsConvftoull
; P8BE-LABEL: fromDiffConstsConvftoull		; P8BE-LABEL: fromDiffConstsConvftoull
; P8LE-LABEL: fromDiffConstsConvftoull		; P8LE-LABEL: fromDiffConstsConvftoull
; P9BE: lxv v2		; P9BE: lxvx v2
; P9BE: blr		; P9BE: blr
; P9LE: lxv v2		; P9LE: lxvx v2
; P9LE: blr		; P9LE: blr
; P8BE: lxvd2x v2		; P8BE: lxvd2x v2
; P8BE: blr		; P8BE: blr
; P8LE: lxvd2x		; P8LE: lxvd2x
; P8LE: xxswapd v2		; P8LE: xxswapd v2
; P8LE: blr		; P8LE: blr
}		}

▲ Show 20 Lines • Show All 469 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/PowerPC/ppc64-i128-abi.ll

	Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	; CHECK-LE: blr			; CHECK-LE: blr

	; CHECK-P9-LABEL: @v1i128_increment_by_one			; CHECK-P9-LABEL: @v1i128_increment_by_one
	; The below FIXME is due to the lowering for BUILD_VECTOR that will be fixed			; The below FIXME is due to the lowering for BUILD_VECTOR that will be fixed
	; in a subsequent patch.			; in a subsequent patch.
	; FIXME: li [[R1:r[0-9]+]], 1			; FIXME: li [[R1:r[0-9]+]], 1
	; FIXME: li [[R2:r[0-9]+]], 0			; FIXME: li [[R2:r[0-9]+]], 0
	; FIXME: mtvsrdd [[V1:v[0-9]+]], [[R2]], [[R1]]			; FIXME: mtvsrdd [[V1:v[0-9]+]], [[R2]], [[R1]]
	; CHECK-P9: lxv [[V1:v[0-9]+]]			; CHECK-P9: lxvx [[V1:v[0-9]+]]
	; CHECK-P9: vadduqm v2, v2, [[V1]]			; CHECK-P9: vadduqm v2, v2, [[V1]]
	; CHECK-P9: blr			; CHECK-P9: blr

	; CHECK-BE-LABEL: @v1i128_increment_by_one			; CHECK-BE-LABEL: @v1i128_increment_by_one
	; CHECK-BE: lxvd2x 35, {{[0-9]+}}, {{[0-9]+}}			; CHECK-BE: lxvd2x 35, {{[0-9]+}}, {{[0-9]+}}
	; CHECK-BE-NOT: xxswapd			; CHECK-BE-NOT: xxswapd
	; CHECK-BE: vadduqm 2, 2, 3			; CHECK-BE: vadduqm 2, 2, 3
	; CHECK-BE-NOT: xxswapd 34, {{[0-9]+}}			; CHECK-BE-NOT: xxswapd 34, {{[0-9]+}}
	▲ Show 20 Lines • Show All 157 Lines • ▼ Show 20 Lines

	; CHECK-LE-LABEL: @call_v1i128_increment_by_val			; CHECK-LE-LABEL: @call_v1i128_increment_by_val
	; CHECK-LE: lvx 2, {{[0-9]+}}, {{[0-9]+}}			; CHECK-LE: lvx 2, {{[0-9]+}}, {{[0-9]+}}
	; CHECK-LE: lvx 3, {{[0-9]+}}, {{[0-9]+}}			; CHECK-LE: lvx 3, {{[0-9]+}}, {{[0-9]+}}
	; CHECK-LE: bl v1i128_increment_by_val			; CHECK-LE: bl v1i128_increment_by_val
	; CHECK-LE: blr			; CHECK-LE: blr

	; CHECK-P9-LABEL: @call_v1i128_increment_by_val			; CHECK-P9-LABEL: @call_v1i128_increment_by_val
	; CHECK-P9-DAG: lxv v2			; CHECK-P9-DAG: lxvx v2
	; CHECK-P9-DAG: lxv v3			; CHECK-P9-DAG: lxvx v3
	; CHECK-P9: bl v1i128_increment_by_val			; CHECK-P9: bl v1i128_increment_by_val
	; CHECK-P9: blr			; CHECK-P9: blr

	; CHECK-BE-LABEL: @call_v1i128_increment_by_val			; CHECK-BE-LABEL: @call_v1i128_increment_by_val


	; CHECK-BE-DAG: lxvw4x 35, {{[0-9]+}}, {{[0-9]+}}			; CHECK-BE-DAG: lxvw4x 35, {{[0-9]+}}, {{[0-9]+}}
	; CHECK-BE-NOT: xxswapd 34, {{[0-9]+}}			; CHECK-BE-NOT: xxswapd 34, {{[0-9]+}}
	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/PowerPC/swaps-le-6.ll

	Show All 27 Lines
	; CHECK-LABEL: @bar0			; CHECK-LABEL: @bar0
	; CHECK-DAG: lxvd2x [[REG1:[0-9]+]]			; CHECK-DAG: lxvd2x [[REG1:[0-9]+]]
	; CHECK-DAG: lxsdx [[REG2:[0-9]+]]			; CHECK-DAG: lxsdx [[REG2:[0-9]+]]
	; CHECK: xxspltd [[REG4:[0-9]+]], [[REG2]], 0			; CHECK: xxspltd [[REG4:[0-9]+]], [[REG2]], 0
	; CHECK: xxpermdi [[REG5:[0-9]+]], [[REG4]], [[REG1]], 1			; CHECK: xxpermdi [[REG5:[0-9]+]], [[REG4]], [[REG1]], 1
	; CHECK: stxvd2x [[REG5]]			; CHECK: stxvd2x [[REG5]]

	; CHECK-P9-LABEL: @bar0			; CHECK-P9-LABEL: @bar0
	; CHECK-P9-DAG: lxv [[REG1:[0-9]+]]			; CHECK-P9-DAG: lxvx [[REG1:[0-9]+]]
	; CHECK-P9-DAG: lfd [[REG2:[0-9]+]], 0(3)			; CHECK-P9-DAG: lfd [[REG2:[0-9]+]], 0(3)
	; CHECK-P9: xxspltd [[REG4:[0-9]+]], [[REG2]], 0			; CHECK-P9: xxspltd [[REG4:[0-9]+]], [[REG2]], 0
	; CHECK-P9: xxpermdi [[REG5:[0-9]+]], [[REG1]], [[REG4]], 1			; CHECK-P9: xxpermdi [[REG5:[0-9]+]], [[REG1]], [[REG4]], 1
	; CHECK-P9: stxv [[REG5]]			; CHECK-P9: stxvx [[REG5]]

	define void @bar1() {			define void @bar1() {
	entry:			entry:
	%0 = load <2 x double>, <2 x double>* @x, align 16			%0 = load <2 x double>, <2 x double>* @x, align 16
	%1 = load double, double* @y, align 8			%1 = load double, double* @y, align 8
	%vecins = insertelement <2 x double> %0, double %1, i32 1			%vecins = insertelement <2 x double> %0, double %1, i32 1
	store <2 x double> %vecins, <2 x double>* @z, align 16			store <2 x double> %vecins, <2 x double>* @z, align 16
	ret void			ret void
	}			}

	; CHECK-LABEL: @bar1			; CHECK-LABEL: @bar1
	; CHECK-DAG: lxvd2x [[REG1:[0-9]+]]			; CHECK-DAG: lxvd2x [[REG1:[0-9]+]]
	; CHECK-DAG: lxsdx [[REG2:[0-9]+]]			; CHECK-DAG: lxsdx [[REG2:[0-9]+]]
	; CHECK: xxspltd [[REG4:[0-9]+]], [[REG2]], 0			; CHECK: xxspltd [[REG4:[0-9]+]], [[REG2]], 0
	; CHECK: xxmrghd [[REG5:[0-9]+]], [[REG1]], [[REG4]]			; CHECK: xxmrghd [[REG5:[0-9]+]], [[REG1]], [[REG4]]
	; CHECK: stxvd2x [[REG5]]			; CHECK: stxvd2x [[REG5]]

	; CHECK-P9-LABEL: @bar1			; CHECK-P9-LABEL: @bar1
	; CHECK-P9-DAG: lxv [[REG1:[0-9]+]]			; CHECK-P9-DAG: lxvx [[REG1:[0-9]+]]
	; CHECK-P9-DAG: lfd [[REG2:[0-9]+]], 0(3)			; CHECK-P9-DAG: lfd [[REG2:[0-9]+]], 0(3)
	; CHECK-P9: xxspltd [[REG4:[0-9]+]], [[REG2]], 0			; CHECK-P9: xxspltd [[REG4:[0-9]+]], [[REG2]], 0
	; CHECK-P9: xxmrgld [[REG5:[0-9]+]], [[REG4]], [[REG1]]			; CHECK-P9: xxmrgld [[REG5:[0-9]+]], [[REG4]], [[REG1]]
	; CHECK-P9: stxv [[REG5]]			; CHECK-P9: stxvx [[REG5]]

llvm/trunk/test/CodeGen/PowerPC/vsx-p9.ll

	Show All 30 Lines

	define void @_Z4testv() {			define void @_Z4testv() {
	entry:			entry:
	; CHECK-LABEL: @_Z4testv			; CHECK-LABEL: @_Z4testv
	%0 = load <16 x i8>, <16 x i8>* @uca, align 16			%0 = load <16 x i8>, <16 x i8>* @uca, align 16
	%1 = load <16 x i8>, <16 x i8>* @ucb, align 16			%1 = load <16 x i8>, <16 x i8>* @ucb, align 16
	%add.i = add <16 x i8> %1, %0			%add.i = add <16 x i8> %1, %0
	tail call void (...) @sink(<16 x i8> %add.i)			tail call void (...) @sink(<16 x i8> %add.i)
	; CHECK: lxv 34, 0(3)			; CHECK: lxvx 34, 0, 3
	; CHECK: lxv 35, 0(4)			; CHECK: lxvx 35, 0, 4
	; CHECK: vaddubm 2, 3, 2			; CHECK: vaddubm 2, 3, 2
	; CHECK: stxv 34,			; CHECK: stxv 34,
	; CHECK: bl sink			; CHECK: bl sink
	%2 = load <16 x i8>, <16 x i8>* @sca, align 16			%2 = load <16 x i8>, <16 x i8>* @sca, align 16
	%3 = load <16 x i8>, <16 x i8>* @scb, align 16			%3 = load <16 x i8>, <16 x i8>* @scb, align 16
	%add.i22 = add <16 x i8> %3, %2			%add.i22 = add <16 x i8> %3, %2
	tail call void (...) @sink(<16 x i8> %add.i22)			tail call void (...) @sink(<16 x i8> %add.i22)
	; CHECK: lxv 34, 0(3)			; CHECK: lxvx 34, 0, 3
	; CHECK: lxv 35, 0(4)			; CHECK: lxvx 35, 0, 4
	; CHECK: vaddubm 2, 3, 2			; CHECK: vaddubm 2, 3, 2
	; CHECK: stxv 34,			; CHECK: stxv 34,
	; CHECK: bl sink			; CHECK: bl sink
	%4 = load <8 x i16>, <8 x i16>* @usa, align 16			%4 = load <8 x i16>, <8 x i16>* @usa, align 16
	%5 = load <8 x i16>, <8 x i16>* @usb, align 16			%5 = load <8 x i16>, <8 x i16>* @usb, align 16
	%add.i21 = add <8 x i16> %5, %4			%add.i21 = add <8 x i16> %5, %4
	tail call void (...) @sink(<8 x i16> %add.i21)			tail call void (...) @sink(<8 x i16> %add.i21)
	; CHECK: lxv 34, 0(3)			; CHECK: lxvx 34, 0, 3
	; CHECK: lxv 35, 0(4)			; CHECK: lxvx 35, 0, 4
	; CHECK: vadduhm 2, 3, 2			; CHECK: vadduhm 2, 3, 2
	; CHECK: stxv 34,			; CHECK: stxv 34,
	; CHECK: bl sink			; CHECK: bl sink
	%6 = load <8 x i16>, <8 x i16>* @ssa, align 16			%6 = load <8 x i16>, <8 x i16>* @ssa, align 16
	%7 = load <8 x i16>, <8 x i16>* @ssb, align 16			%7 = load <8 x i16>, <8 x i16>* @ssb, align 16
	%add.i20 = add <8 x i16> %7, %6			%add.i20 = add <8 x i16> %7, %6
	tail call void (...) @sink(<8 x i16> %add.i20)			tail call void (...) @sink(<8 x i16> %add.i20)
	; CHECK: lxv 34, 0(3)			; CHECK: lxvx 34, 0, 3
	; CHECK: lxv 35, 0(4)			; CHECK: lxvx 35, 0, 4
	; CHECK: vadduhm 2, 3, 2			; CHECK: vadduhm 2, 3, 2
	; CHECK: stxv 34,			; CHECK: stxv 34,
	; CHECK: bl sink			; CHECK: bl sink
	%8 = load <4 x i32>, <4 x i32>* @uia, align 16			%8 = load <4 x i32>, <4 x i32>* @uia, align 16
	%9 = load <4 x i32>, <4 x i32>* @uib, align 16			%9 = load <4 x i32>, <4 x i32>* @uib, align 16
	%add.i19 = add <4 x i32> %9, %8			%add.i19 = add <4 x i32> %9, %8
	tail call void (...) @sink(<4 x i32> %add.i19)			tail call void (...) @sink(<4 x i32> %add.i19)
	; CHECK: lxv 34, 0(3)			; CHECK: lxvx 34, 0, 3
	; CHECK: lxv 35, 0(4)			; CHECK: lxvx 35, 0, 4
	; CHECK: vadduwm 2, 3, 2			; CHECK: vadduwm 2, 3, 2
	; CHECK: stxv 34,			; CHECK: stxv 34,
	; CHECK: bl sink			; CHECK: bl sink
	%10 = load <4 x i32>, <4 x i32>* @sia, align 16			%10 = load <4 x i32>, <4 x i32>* @sia, align 16
	%11 = load <4 x i32>, <4 x i32>* @sib, align 16			%11 = load <4 x i32>, <4 x i32>* @sib, align 16
	%add.i18 = add <4 x i32> %11, %10			%add.i18 = add <4 x i32> %11, %10
	tail call void (...) @sink(<4 x i32> %add.i18)			tail call void (...) @sink(<4 x i32> %add.i18)
	; CHECK: lxv 34, 0(3)			; CHECK: lxvx 34, 0, 3
	; CHECK: lxv 35, 0(4)			; CHECK: lxvx 35, 0, 4
	; CHECK: vadduwm 2, 3, 2			; CHECK: vadduwm 2, 3, 2
	; CHECK: stxv 34,			; CHECK: stxv 34,
	; CHECK: bl sink			; CHECK: bl sink
	%12 = load <2 x i64>, <2 x i64>* @ulla, align 16			%12 = load <2 x i64>, <2 x i64>* @ulla, align 16
	%13 = load <2 x i64>, <2 x i64>* @ullb, align 16			%13 = load <2 x i64>, <2 x i64>* @ullb, align 16
	%add.i17 = add <2 x i64> %13, %12			%add.i17 = add <2 x i64> %13, %12
	tail call void (...) @sink(<2 x i64> %add.i17)			tail call void (...) @sink(<2 x i64> %add.i17)
	; CHECK: lxv 34, 0(3)			; CHECK: lxvx 34, 0, 3
	; CHECK: lxv 35, 0(4)			; CHECK: lxvx 35, 0, 4
	; CHECK: vaddudm 2, 3, 2			; CHECK: vaddudm 2, 3, 2
	; CHECK: stxv 34,			; CHECK: stxv 34,
	; CHECK: bl sink			; CHECK: bl sink
	%14 = load <2 x i64>, <2 x i64>* @slla, align 16			%14 = load <2 x i64>, <2 x i64>* @slla, align 16
	%15 = load <2 x i64>, <2 x i64>* @sllb, align 16			%15 = load <2 x i64>, <2 x i64>* @sllb, align 16
	%add.i16 = add <2 x i64> %15, %14			%add.i16 = add <2 x i64> %15, %14
	tail call void (...) @sink(<2 x i64> %add.i16)			tail call void (...) @sink(<2 x i64> %add.i16)
	; CHECK: lxv 34, 0(3)			; CHECK: lxvx 34, 0, 3
	; CHECK: lxv 35, 0(4)			; CHECK: lxvx 35, 0, 4
	; CHECK: vaddudm 2, 3, 2			; CHECK: vaddudm 2, 3, 2
	; CHECK: stxv 34,			; CHECK: stxv 34,
	; CHECK: bl sink			; CHECK: bl sink
	%16 = load <1 x i128>, <1 x i128>* @uxa, align 16			%16 = load <1 x i128>, <1 x i128>* @uxa, align 16
	%17 = load <1 x i128>, <1 x i128>* @uxb, align 16			%17 = load <1 x i128>, <1 x i128>* @uxb, align 16
	%add.i15 = add <1 x i128> %17, %16			%add.i15 = add <1 x i128> %17, %16
	tail call void (...) @sink(<1 x i128> %add.i15)			tail call void (...) @sink(<1 x i128> %add.i15)
	; CHECK: lxv 34, 0(3)			; CHECK: lxvx 34, 0, 3
	; CHECK: lxv 35, 0(4)			; CHECK: lxvx 35, 0, 4
	; CHECK: vadduqm 2, 3, 2			; CHECK: vadduqm 2, 3, 2
	; CHECK: stxv 34,			; CHECK: stxv 34,
	; CHECK: bl sink			; CHECK: bl sink
	%18 = load <1 x i128>, <1 x i128>* @sxa, align 16			%18 = load <1 x i128>, <1 x i128>* @sxa, align 16
	%19 = load <1 x i128>, <1 x i128>* @sxb, align 16			%19 = load <1 x i128>, <1 x i128>* @sxb, align 16
	%add.i14 = add <1 x i128> %19, %18			%add.i14 = add <1 x i128> %19, %18
	tail call void (...) @sink(<1 x i128> %add.i14)			tail call void (...) @sink(<1 x i128> %add.i14)
	; CHECK: lxv 34, 0(3)			; CHECK: lxvx 34, 0, 3
	; CHECK: lxv 35, 0(4)			; CHECK: lxvx 35, 0, 4
	; CHECK: vadduqm 2, 3, 2			; CHECK: vadduqm 2, 3, 2
	; CHECK: stxv 34,			; CHECK: stxv 34,
	; CHECK: bl sink			; CHECK: bl sink
	%20 = load <4 x float>, <4 x float>* @vfa, align 16			%20 = load <4 x float>, <4 x float>* @vfa, align 16
	%21 = load <4 x float>, <4 x float>* @vfb, align 16			%21 = load <4 x float>, <4 x float>* @vfb, align 16
	%add.i13 = fadd <4 x float> %20, %21			%add.i13 = fadd <4 x float> %20, %21
	tail call void (...) @sink(<4 x float> %add.i13)			tail call void (...) @sink(<4 x float> %add.i13)
	; CHECK: lxv 0, 0(3)			; CHECK: lxvx 0, 0, 3
	; CHECK: lxv 1, 0(4)			; CHECK: lxvx 1, 0, 4
	; CHECK: xvaddsp 34, 0, 1			; CHECK: xvaddsp 34, 0, 1
	; CHECK: stxv 34,			; CHECK: stxv 34,
	; CHECK: bl sink			; CHECK: bl sink
	%22 = load <2 x double>, <2 x double>* @vda, align 16			%22 = load <2 x double>, <2 x double>* @vda, align 16
	%23 = load <2 x double>, <2 x double>* @vdb, align 16			%23 = load <2 x double>, <2 x double>* @vdb, align 16
	%add.i12 = fadd <2 x double> %22, %23			%add.i12 = fadd <2 x double> %22, %23
	tail call void (...) @sink(<2 x double> %add.i12)			tail call void (...) @sink(<2 x double> %add.i12)
	; CHECK: lxv 0, 0(3)			; CHECK: lxvx 0, 0, 3
	; CHECK: lxv 1, 0(4)			; CHECK: lxvx 1, 0, 4
	; CHECK: xvadddp 0, 0, 1			; CHECK: xvadddp 0, 0, 1
	; CHECK: stxv 0,			; CHECK: stxv 0,
	; CHECK: bl sink			; CHECK: bl sink
	ret void			ret void
	}			}

	; Function Attrs: nounwind readnone			; Function Attrs: nounwind readnone
	define <4 x float> @testXVIEXPSP(<4 x i32> %a, <4 x i32> %b) {			define <4 x float> @testXVIEXPSP(<4 x i32> %a, <4 x i32> %b) {
	▲ Show 20 Lines • Show All 266 Lines • Show Last 20 Lines