This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
-
CMakeLists.txt
-
PPC.h
1
PPCPostRAPeephole.cpp
-
PPCTargetMachine.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
Frames-large.ll
-
eh-dwarf-cfa.ll
-
fma-mutate.ll
-
save-cr-ppc32svr4.ll
-
sjlj.ll

Differential D34193

[PowerPC] peephole optimization on use after register copy
AbandonedPublic

Authored by inouehrs on Jun 13 2017, 10:48 PM.

Download Raw Diff

Details

Reviewers

kbarton
nemanjai
sfertile
jtony
lei
stefanp
syzaara
MatzeB
echristo
hfinkel

Summary

This patch adds a new PowerPC-dependent peephole optimization pass after register allocation.
The new peephole optimization rewrites operands of instructions after a register copy to reduce latency.
For example, in code sequences like

mr X, Y
(no update in X or Y)
addi Z, X, 1

this pass updates addi Z, X, 1 to addi Z, Y, 1 to make the execution of the addi independent from the preceding mr instruction.
Typically, such register copies are for method parameters or a return value (e.g. copying a method parameter to another register before the first use.)
This pattern appears quite frequently; this optimization rewrites more than 100k instructions while compiling LLVM+Clang.

The performance was improved in most of SPECCPU benchmarks on POWER8.

I400.perlbench	0.03%
I401.bzip2	0.28%
I403.gcc	0.32%
I429.mcf	1.43%
I445.gobmk	-0.05%
I456.hmmer	0.13%
I458.sjeng	0.06%
I462.libquantum	0.70%
I464.h264ref	0.03%
I471.omnetpp	0.37%
I473.astar	-0.03%
I483.xalancbmk	0.85%
f433.milc	0.70%
f444.namd	0.06%
f447.dealII	0.57%
f450.soplex	-0.74%
f453.povray	0.28%
f470.lbm	0.06%
f482.sphinx3	0.78%
SPECINT		0.34%
SPECFP  	0.25%
TOTAL   	0.31%

(Average of 8 runs. A positive number means improvement by the patch.)

Diff Detail

Event Timeline

inouehrs created this revision.Jun 13 2017, 10:48 PM

Herald added a subscriber: mgorny. · View Herald TranscriptJun 13 2017, 10:48 PM

lei added inline comments.Jun 14 2017, 11:31 AM

lib/Target/PowerPC/PPCPostRAPeephole.cpp
9	The comments below should be doxygen comments, identified by "///" instead of the usual "/"

Would this be covered by https://reviews.llvm.org/D30751 as well?

@MatzeB My optimization seems to be mostly covered by D30751. Thank you for pointing this out.

inouehrs abandoned this revision.Jun 15 2017, 7:10 PM

Revision Contents

Path

Size

lib/

Target/

PowerPC/

CMakeLists.txt

1 line

PPC.h

1 line

PPCPostRAPeephole.cpp

279 lines

PPCTargetMachine.cpp

8 lines

test/

CodeGen/

PowerPC/

4 lines

2 lines

2 lines

6 lines

4 lines

Diff 102476

lib/Target/PowerPC/CMakeLists.txt

Show All 23 Lines	add_llvm_target(PowerPCCodeGen
PPCISelLowering.cpp		PPCISelLowering.cpp
PPCEarlyReturn.cpp		PPCEarlyReturn.cpp
PPCFastISel.cpp		PPCFastISel.cpp
PPCFrameLowering.cpp		PPCFrameLowering.cpp
PPCLoopPreIncPrep.cpp		PPCLoopPreIncPrep.cpp
PPCMCInstLower.cpp		PPCMCInstLower.cpp
PPCMachineFunctionInfo.cpp		PPCMachineFunctionInfo.cpp
PPCMIPeephole.cpp		PPCMIPeephole.cpp
		PPCPostRAPeephole.cpp
PPCRegisterInfo.cpp		PPCRegisterInfo.cpp
PPCQPXLoadSplat.cpp		PPCQPXLoadSplat.cpp
PPCSubtarget.cpp		PPCSubtarget.cpp
PPCTargetMachine.cpp		PPCTargetMachine.cpp
PPCTargetObjectFile.cpp		PPCTargetObjectFile.cpp
PPCTargetTransformInfo.cpp		PPCTargetTransformInfo.cpp
PPCTOCRegDeps.cpp		PPCTOCRegDeps.cpp
PPCTLSDynamicCall.cpp		PPCTLSDynamicCall.cpp
Show All 11 Lines

lib/Target/PowerPC/PPC.h

	Show All 34 Lines
	#endif			#endif
	FunctionPass *createPPCLoopPreIncPrepPass(PPCTargetMachine &TM);			FunctionPass *createPPCLoopPreIncPrepPass(PPCTargetMachine &TM);
	FunctionPass *createPPCTOCRegDepsPass();			FunctionPass *createPPCTOCRegDepsPass();
	FunctionPass *createPPCEarlyReturnPass();			FunctionPass *createPPCEarlyReturnPass();
	FunctionPass *createPPCVSXCopyPass();			FunctionPass *createPPCVSXCopyPass();
	FunctionPass *createPPCVSXFMAMutatePass();			FunctionPass *createPPCVSXFMAMutatePass();
	FunctionPass *createPPCVSXSwapRemovalPass();			FunctionPass *createPPCVSXSwapRemovalPass();
	FunctionPass *createPPCMIPeepholePass();			FunctionPass *createPPCMIPeepholePass();
				FunctionPass *createPPCPostRAPeepholePass();
	FunctionPass *createPPCBranchSelectionPass();			FunctionPass *createPPCBranchSelectionPass();
	FunctionPass *createPPCQPXLoadSplatPass();			FunctionPass *createPPCQPXLoadSplatPass();
	FunctionPass *createPPCISelDag(PPCTargetMachine &TM);			FunctionPass *createPPCISelDag(PPCTargetMachine &TM);
	FunctionPass *createPPCTLSDynamicCallPass();			FunctionPass *createPPCTLSDynamicCallPass();
	FunctionPass *createPPCBoolRetToIntPass();			FunctionPass *createPPCBoolRetToIntPass();
	FunctionPass *createPPCExpandISELPass();			FunctionPass *createPPCExpandISELPass();
	void LowerPPCMachineInstrToMCInst(const MachineInstr *MI, MCInst &OutMI,			void LowerPPCMachineInstrToMCInst(const MachineInstr *MI, MCInst &OutMI,
	AsmPrinter &AP, bool isDarwin);			AsmPrinter &AP, bool isDarwin);
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCPostRAPeephole.cpp

This file was added.

				//===-------------- PPCPostRAPeephole.cpp - MI Peephole Cleanups -------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===---------------------------------------------------------------------===//
				//
				leiUnsubmitted Not Done Reply Inline Actions The comments below should be doxygen comments, identified by "///" instead of the usual "/" lei: The comments below should be doxygen comments, identified by "///" instead of the usual "/"
				// This pass modifies register operands of instructions after a register copy.
				// For example, in code sequences like
				// mr X, Y
				// (no update in X or Y)
				// addi Z, X, 1
				// this pass updates `addi Z, X, 1` to `addi Z, Y, 1` to increase ILP.
				//
				//===---------------------------------------------------------------------===//

				#include "PPCInstrInfo.h"
				#include "PPC.h"
				#include "PPCInstrBuilder.h"
				#include "PPCTargetMachine.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/Support/Debug.h"

				using namespace llvm;

				#define DEBUG_TYPE "ppc-postra-peepholes"

				namespace llvm {
				void initializePPCPostRAPeepholePass(PassRegistry&);
				}

				namespace {

				struct PPCPostRAPeephole : public MachineFunctionPass {

				static char ID;
				const PPCInstrInfo *TII;
				MachineFunction *MF;

				PPCPostRAPeephole() : MachineFunctionPass(ID) {
				initializePPCPostRAPeepholePass(*PassRegistry::getPassRegistry());
				}

				private:
				// Initialize class variables.
				void initialize(MachineFunction &MFParm);

				// Perform peepholes.
				bool optUseAfterRegCopy(void);

				public:
				// Main entry point for this pass.
				bool runOnMachineFunction(MachineFunction &MF) override {
				if (skipFunction(*MF.getFunction()))
				return false;
				initialize(MF);
				bool changed = false;

				// opt use after regcopy
				changed \|= optUseAfterRegCopy();

				return changed;
				}
				};

				// A class representing two physical registers having the same value.
				class PPCEquiRegPairInfo {
				public:
				PPCEquiRegPairInfo(unsigned From, unsigned To):
				SrcReg(From), DstReg(To), SrcRegKilled(false) {}

				unsigned getSrcReg() { return SrcReg; }
				unsigned getDstReg() { return DstReg; }
				bool needSrcRegKillFlag() { return SrcRegKilled; }

				// It returns true if the specified register conflicts with SrcReg or DstReg.
				bool isConflict(const unsigned Reg, const TargetRegisterInfo *TRI) const {
				if (Reg == SrcReg \|\| Reg == DstReg \|\|
				TRI->isSuperOrSubRegisterEq(Reg, SrcReg) \|\|
				TRI->isSuperOrSubRegisterEq(Reg, DstReg))
				return true;
				return false;
				}

				// It returns true if the specified instruction updates SrcReg or DstReg.
				bool isConflict(const MachineInstr &MI, const TargetRegisterInfo *TRI,
				const PPCInstrInfo *TII) const {
				const MCInstrDesc &MCID = TII->get(MI.getOpcode());
				for (unsigned I = 0; I < MCID.getNumDefs(); I++)
				if (isConflict(MI.getOperand(I).getReg(), TRI))
				return true;

				if (MCID.ImplicitDefs)
				for (const MCPhysReg ImpDef = MCID.getImplicitDefs(); ImpDef; ImpDef++)
				if (isConflict(*ImpDef, TRI))
				return true;

				return false;
				}

				// If we see a kill flag for SrcReg, we remember it
				// to maintain kill flag later.
				void checkKillFlags(const MachineOperand &MO,
				const TargetRegisterInfo *TRI) {
				assert(MO.isKill());
				if (TRI->isSuperOrSubRegisterEq(MO.getReg(), SrcReg))
				SrcRegKilled = true;
				}

				protected:
				unsigned SrcReg;
				unsigned DstReg;
				bool SrcRegKilled;
				};

				// Initialize class variables.
				void PPCPostRAPeephole::initialize(MachineFunction &MFParm) {
				MF = &MFParm;
				TII = MF->getSubtarget<PPCSubtarget>().getInstrInfo();
				DEBUG(dbgs() << "* PowerPC Post RA peephole pass *\n\n");
				DEBUG(MF->dump());
				}

				// Perform peephole optimizations.
				bool PPCPostRAPeephole::optUseAfterRegCopy(void) {
				const PPCSubtarget *PPCSubTarget = &MF->getSubtarget<PPCSubtarget>();
				const TargetRegisterInfo *TRI = PPCSubTarget->getRegisterInfo();
				bool Changed = false;
				SmallVector<PPCEquiRegPairInfo, 8> EquiPairs;
				for (MachineBasicBlock &MBB : *MF) {
				// This optimization is local to each BB. So we clear the information
				// of equal register pairs used in the previous BB.
				EquiPairs.clear();
				for (MachineInstr &MI : MBB) {
				unsigned Opc = MI.getOpcode();
				if (MI.isDebugValue())
				continue;

				// We remeber whether we see a kill flag for SrcReg of each pair.
				for (auto &MO: MI.operands())
				if (MO.isReg() && MO.isKill())
				for (auto &RegPair: EquiPairs)
				RegPair.checkKillFlags(MO, TRI);

				// Refer PPCInstrInfo::copyPhysReg to find opcodes used for copying
				// the value in a physical register for each register class.
				// Currently we do not optimize VSX registers (copied by xxlor),
				// which may involve different register types, i.e. FPR and VRF.
				bool IsRegCopy = (Opc == PPC::FMR);
				if ((Opc == PPC::OR8 \|\| Opc == PPC::OR \|\| Opc == PPC::VOR) &&
				MI.getOperand(1).getReg() == MI.getOperand(2).getReg())
				IsRegCopy = true;

				if (IsRegCopy) {
				unsigned DstReg = MI.getOperand(0).getReg();
				unsigned SrcReg = MI.getOperand(1).getReg();

				// We do not optimize gpr0, which may mean constant 0.
				if (TRI->isSuperOrSubRegisterEq(DstReg, PPC::X0) \|\|
				TRI->isSuperOrSubRegisterEq(SrcReg, PPC::X0))
				continue;

				// Register pairs are eliminated if src or dst is overwitten.
				SmallVector<PPCEquiRegPairInfo, 8> CurrentEquiPairs(EquiPairs);
				EquiPairs.clear();
				for (auto &RegPair: CurrentEquiPairs)
				if (!RegPair.isConflict(MI, TRI, TII))
				EquiPairs.push_back(RegPair);

				// We create a new register pair having the same value.
				EquiPairs.push_back(PPCEquiRegPairInfo(SrcReg, DstReg));

				for (auto &MO: MI.operands())
				if (MO.isReg() && MO.isKill())
				EquiPairs.back().checkKillFlags(MO, TRI);

				continue;
				}

				if (EquiPairs.empty()) continue;

				// Currently, we just invalidate all register pairs at a call.
				// For further opportunity, we can keep reg pairs
				// if both src and dst regs are callee save.
				if (MI.isCall()) {
				EquiPairs.clear();
				continue;
				}

				const MCInstrDesc &MCID = TII->get(Opc);
				SmallVector<PPCEquiRegPairInfo, 8> CurrentEquiPairs(EquiPairs);
				EquiPairs.clear();
				MachineBasicBlock::reverse_iterator MBBI, MBBIE;
				for (auto &RegPair: CurrentEquiPairs) {
				const unsigned SrcReg = RegPair.getSrcReg();
				const unsigned DstReg = RegPair.getDstReg();

				// We check all input registers for finding optimization opportunity.
				for (unsigned I = MCID.getNumDefs(); I < MCID.getNumOperands(); I++) {
				MachineOperand &MO = MI.getOperand(I);

				if (MO.isReg() && MO.getReg() == DstReg &&
				!MO.isTied()) { // we do not optimize ldux etc
				// Here we identified opportunity to use SrcReg instead of DstReg.

				// If this is the last use of DstReg, we move kill flag
				// by reversely iterating instructions
				if (MO.isKill()) {
				bool found = false;
				for (MBBI = MI, MBBIE = MBB.rend();
				MBBI != MBBIE && !found; MBBI++) {
				for (int I = MBBI->getNumOperands() - 1; I >= 0; I--) {
				MachineOperand &MO2 = MBBI->getOperand(I);
				if (MO2.isReg() &&
				TRI->isSuperOrSubRegisterEq(MO2.getReg(), DstReg)) {
				if (!MO2.isDef())
				MO2.setIsKill(true);
				else {
				// todo: Since there is no use, we can remove register copy
				}
				found = true;
				break;
				}
				}
				assert(found);
				}
				}

				Changed = true;
				MO.setIsKill(false);
				MO.setReg(SrcReg);
				DEBUG(dbgs() << "An operand modified in\n ");
				DEBUG(MI.dump());

				// If we already seen the use of SrcReg with a kill flag,
				// we need to move the kill flag to this instruction.
				if (RegPair.needSrcRegKillFlag()) {
				bool found = false;
				for (MBBI = MI, MBBIE = MBB.rend();
				MBBI != MBBIE && !found; MBBI++) {
				for (int I = MBBI->getNumOperands() - 1; I >= 0; I--) {
				MachineOperand &MO2 = MBBI->getOperand(I);
				if (MO2.isReg() && !MO2.isDef() && MO2.isKill() &&
				TRI->isSuperOrSubRegisterEq(MO2.getReg(), SrcReg)) {
				MO2.setIsKill(false);
				found = true;
				break;
				}
				}
				}
				assert(found);
				MO.setIsKill(true);
				}
				}
				}

				// Register pairs are eliminated if src or dst is overwitten.
				if (!RegPair.isConflict(MI, TRI, TII))
				EquiPairs.push_back(RegPair);
				}
				}
				}
				return Changed;
				}

				} // end default namespace

				INITIALIZE_PASS_BEGIN(PPCPostRAPeephole, DEBUG_TYPE,
				"PowerPC Post RA Peephole Optimization", false, false)
				INITIALIZE_PASS_END(PPCPostRAPeephole, DEBUG_TYPE,
				"PowerPC Post RA Peephole Optimization", false, false)

				char PPCPostRAPeephole::ID = 0;
				FunctionPass*
				llvm::createPPCPostRAPeepholePass() { return new PPCPostRAPeephole(); }

lib/Target/PowerPC/PPCTargetMachine.cpp

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
static cl::		static cl::
opt<bool> DisableQPXLoadSplat("disable-ppc-qpx-load-splat", cl::Hidden,		opt<bool> DisableQPXLoadSplat("disable-ppc-qpx-load-splat", cl::Hidden,
cl::desc("Disable QPX load splat simplification"));		cl::desc("Disable QPX load splat simplification"));

static cl::		static cl::
opt<bool> DisableMIPeephole("disable-ppc-peephole", cl::Hidden,		opt<bool> DisableMIPeephole("disable-ppc-peephole", cl::Hidden,
cl::desc("Disable machine peepholes for PPC"));		cl::desc("Disable machine peepholes for PPC"));

		static cl::
		opt<bool> DisablePostRAPeephole("disable-ppc-postra-peephole", cl::Hidden,
		cl::desc("Disable post RA peepholes for PPC"));

static cl::opt<bool>		static cl::opt<bool>
EnableGEPOpt("ppc-gep-opt", cl::Hidden,		EnableGEPOpt("ppc-gep-opt", cl::Hidden,
cl::desc("Enable optimizations on complex GEPs"),		cl::desc("Enable optimizations on complex GEPs"),
cl::init(true));		cl::init(true));

static cl::opt<bool>		static cl::opt<bool>
EnablePrefetch("enable-ppc-prefetching",		EnablePrefetch("enable-ppc-prefetching",
cl::desc("disable software prefetching on PPC"),		cl::desc("disable software prefetching on PPC"),
▲ Show 20 Lines • Show All 338 Lines • ▼ Show 20 Lines	if (getPPCTargetMachine().isPositionIndependent()) {
addPass(createPPCTLSDynamicCallPass());		addPass(createPPCTLSDynamicCallPass());
}		}
if (EnableExtraTOCRegDeps)		if (EnableExtraTOCRegDeps)
addPass(createPPCTOCRegDepsPass());		addPass(createPPCTOCRegDepsPass());
}		}

void PPCPassConfig::addPreSched2() {		void PPCPassConfig::addPreSched2() {
if (getOptLevel() != CodeGenOpt::None) {		if (getOptLevel() != CodeGenOpt::None) {
		// Target-specific peephole optimization after register allocation.
		if (!DisablePostRAPeephole)
		addPass(createPPCPostRAPeepholePass());

addPass(&IfConverterID);		addPass(&IfConverterID);

// This optimization must happen after anything that might do store-to-load		// This optimization must happen after anything that might do store-to-load
// forwarding. Here we're after RA (and, thus, when spills are inserted)		// forwarding. Here we're after RA (and, thus, when spills are inserted)
// but before post-RA scheduling.		// but before post-RA scheduling.
if (!DisableQPXLoadSplat)		if (!DisableQPXLoadSplat)
addPass(createPPCQPXLoadSplatPass());		addPass(createPPCQPXLoadSplatPass());
}		}
Show All 16 Lines

test/CodeGen/PowerPC/Frames-large.ll

	Show All 21 Lines


	; PPC32-FP: _f1:			; PPC32-FP: _f1:
	; PPC32-FP: lis r0, -1			; PPC32-FP: lis r0, -1
	; PPC32-FP: stw r31, -4(r1)			; PPC32-FP: stw r31, -4(r1)
	; PPC32-FP: ori r0, r0, 32736			; PPC32-FP: ori r0, r0, 32736
	; PPC32-FP: stwux r1, r1, r0			; PPC32-FP: stwux r1, r1, r0
	; PPC32-FP: mr r31, r1			; PPC32-FP: mr r31, r1
	; PPC32-FP: addi r3, r31, 32			; PPC32-FP: addi r3, r1, 32
	; PPC32-FP: lwz r1, 0(r1)			; PPC32-FP: lwz r1, 0(r1)
	; PPC32-FP: lwz r31, -4(r1)			; PPC32-FP: lwz r31, -4(r1)
	; PPC32-FP: blr			; PPC32-FP: blr


	; PPC64-NOFP: _f1:			; PPC64-NOFP: _f1:
	; PPC64-NOFP: lis r0, -1			; PPC64-NOFP: lis r0, -1
	; PPC64-NOFP: ori r0, r0, 32720			; PPC64-NOFP: ori r0, r0, 32720
	; PPC64-NOFP: stdux r1, r1, r0			; PPC64-NOFP: stdux r1, r1, r0
	; PPC64-NOFP: addi r3, r1, 52			; PPC64-NOFP: addi r3, r1, 52
	; PPC64-NOFP: ld r1, 0(r1)			; PPC64-NOFP: ld r1, 0(r1)
	; PPC64-NOFP: blr			; PPC64-NOFP: blr


	; PPC64-FP: _f1:			; PPC64-FP: _f1:
	; PPC64-FP: lis r0, -1			; PPC64-FP: lis r0, -1
	; PPC64-FP: std r31, -8(r1)			; PPC64-FP: std r31, -8(r1)
	; PPC64-FP: ori r0, r0, 32704			; PPC64-FP: ori r0, r0, 32704
	; PPC64-FP: stdux r1, r1, r0			; PPC64-FP: stdux r1, r1, r0
	; PPC64-FP: mr r31, r1			; PPC64-FP: mr r31, r1
	; PPC64-FP: addi r3, r31, 60			; PPC64-FP: addi r3, r1, 60
	; PPC64-FP: ld r1, 0(r1)			; PPC64-FP: ld r1, 0(r1)
	; PPC64-FP: ld r31, -8(r1)			; PPC64-FP: ld r31, -8(r1)
	; PPC64-FP: blr			; PPC64-FP: blr

test/CodeGen/PowerPC/eh-dwarf-cfa.ll

	; RUN: llc < %s \| FileCheck %s			; RUN: llc < %s \| FileCheck %s
	target datalayout = "e-m:e-i64:64-n32:64"			target datalayout = "e-m:e-i64:64-n32:64"
	target triple = "powerpc64le-unknown-linux-gnu"			target triple = "powerpc64le-unknown-linux-gnu"

	define void @_Z1fv() #0 {			define void @_Z1fv() #0 {
	entry:			entry:
	%0 = call i8* @llvm.eh.dwarf.cfa(i32 0)			%0 = call i8* @llvm.eh.dwarf.cfa(i32 0)
	call void @_Z1gPv(i8* %0)			call void @_Z1gPv(i8* %0)
	ret void			ret void

	; CHECK-LABEL: @_Z1fv			; CHECK-LABEL: @_Z1fv
	; CHECK: stdu 1, -[[SS:[0-9]+]](1)			; CHECK: stdu 1, -[[SS:[0-9]+]](1)
	; CHECK: .cfi_def_cfa_offset [[SS]]			; CHECK: .cfi_def_cfa_offset [[SS]]
	; CHECK: mr 31, 1			; CHECK: mr 31, 1
	; CHECK: .cfi_def_cfa_register r31			; CHECK: .cfi_def_cfa_register r31
	; CHECK: addi 3, 31, [[SS]]			; CHECK: addi 3, 1, [[SS]]
	; CHECK-NEXT: bl _Z1gPv			; CHECK-NEXT: bl _Z1gPv
	; CHECK: blr			; CHECK: blr
	}			}

	declare void @_Z1gPv(i8*)			declare void @_Z1gPv(i8*)

	; Function Attrs: nounwind			; Function Attrs: nounwind
	declare i8* @llvm.eh.dwarf.cfa(i32) #1			declare i8* @llvm.eh.dwarf.cfa(i32) #1

	attributes #0 = { "no-frame-pointer-elim"="true" "target-cpu"="ppc64le" }			attributes #0 = { "no-frame-pointer-elim"="true" "target-cpu"="ppc64le" }
	attributes #1 = { nounwind }			attributes #1 = { nounwind }

test/CodeGen/PowerPC/fma-mutate.ll

	; Test several VSX FMA mutation opportunities. The first one isn't a			; Test several VSX FMA mutation opportunities. The first one isn't a
	; reasonable transformation because the killed product register is the			; reasonable transformation because the killed product register is the
	; same as the FMA target register. The second one is legal. The third			; same as the FMA target register. The second one is legal. The third
	; one doesn't fit the feeding-copy pattern.			; one doesn't fit the feeding-copy pattern.

	; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -enable-unsafe-fp-math -mattr=+vsx -disable-ppc-vsx-fma-mutation=false \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -enable-unsafe-fp-math -mattr=+vsx -disable-ppc-vsx-fma-mutation=false \| FileCheck %s
	target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"			target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	declare double @llvm.sqrt.f64(double)			declare double @llvm.sqrt.f64(double)

	define double @foo3(double %a) nounwind {			define double @foo3(double %a) nounwind {
	%r = call double @llvm.sqrt.f64(double %a)			%r = call double @llvm.sqrt.f64(double %a)
	ret double %r			ret double %r

	; CHECK: @foo3			; CHECK: @foo3
	; CHECK: xsnmsubadp [[REG:[0-9]+]], {{[0-9]+}}, [[REG]]			; CHECK: xsnmsubadp [[REG:[0-9]+]], {{[0-9]+}}, {{[0-9]+}}
	; CHECK: xsmaddmdp			; CHECK: xsmaddmdp
	; CHECK: xsmaddadp			; CHECK: xsmaddadp
	}			}

test/CodeGen/PowerPC/save-cr-ppc32svr4.ll

	; RUN: llc -march=ppc32 -relocation-model pic < %s \| FileCheck %s			; RUN: llc -march=ppc32 -relocation-model pic < %s \| FileCheck %s
	;			;
	; Make sure that the CR register is saved correctly on PPC32/SVR4.			; Make sure that the CR register is saved correctly on PPC32/SVR4.

	; CHECK-LABEL: fred:			; CHECK-LABEL: fred:
	; CHECK: stwu 1, -32(1)			; CHECK: stwu 1, -32(1)
				; CHECK: mfcr [[CR:[0-9]+]]
	; CHECK: stw 31, 28(1)			; CHECK: stw 31, 28(1)
	; CHECK: mr 31, 1
	; CHECK: stw 30, 24(1)			; CHECK: stw 30, 24(1)
	; CHECK: mfcr [[CR:[0-9]+]]			; CHECK: mr 31, 1
	; CHECK: stw [[CR]], 20(31)			; CHECK: stw [[CR]], 20(1)

	target datalayout = "E-m:e-p:32:32-i64:64-n32"			target datalayout = "E-m:e-p:32:32-i64:64-n32"
	target triple = "powerpc-unknown-freebsd"			target triple = "powerpc-unknown-freebsd"

	; Function Attrs: norecurse nounwind readnone sspstrong			; Function Attrs: norecurse nounwind readnone sspstrong
	define i64 @fred(double %a0) local_unnamed_addr #0 {			define i64 @fred(double %a0) local_unnamed_addr #0 {
	b1:			b1:
	%v2 = fcmp olt double %a0, 0x43E0000000000000			%v2 = fcmp olt double %a0, 0x43E0000000000000
	Show All 27 Lines

test/CodeGen/PowerPC/sjlj.ll

	Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; CHECK: std			; CHECK: std
	; Make sure that we're not saving VRSAVE on non-Darwin:			; Make sure that we're not saving VRSAVE on non-Darwin:
	; CHECK-NOT: mfspr			; CHECK-NOT: mfspr

	; CHECK-DAG: stfd			; CHECK-DAG: stfd
	; CHECK-DAG: stxvd2x			; CHECK-DAG: stxvd2x

	; CHECK-DAG: addis [[REG:[0-9]+]], 2, env_sigill@toc@ha			; CHECK-DAG: addis [[REG:[0-9]+]], 2, env_sigill@toc@ha
	; CHECK-DAG: std 31, env_sigill@toc@l([[REG]])			; CHECK-DAG: std 1, env_sigill@toc@l([[REG]])
	; CHECK-DAG: addi [[REGA:[0-9]+]], [[REG]], env_sigill@toc@l			; CHECK-DAG: addi [[REGA:[0-9]+]], [[REG]], env_sigill@toc@l
	; CHECK-DAG: std [[REGA]], [[OFF:[0-9]+]](31) # 8-byte Folded Spill			; CHECK-DAG: std [[REGA]], [[OFF:[0-9]+]](1) # 8-byte Folded Spill
	; CHECK-DAG: std 1, 16([[REGA]])			; CHECK-DAG: std 1, 16([[REGA]])
	; CHECK-DAG: std 2, 24([[REGA]])			; CHECK-DAG: std 2, 24([[REGA]])
	; CHECK: bcl 20, 31, .LBB1_3			; CHECK: bcl 20, 31, .LBB1_3
	; CHECK: li 3, 1			; CHECK: li 3, 1
	; CHECK: #EH_SjLj_Setup .LBB1_3			; CHECK: #EH_SjLj_Setup .LBB1_3
	; CHECK: b .LBB1_1			; CHECK: b .LBB1_1

	; CHECK: .LBB1_3:			; CHECK: .LBB1_3:
	▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines