This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the scheduler
ClosedPublic

Authored by • tstellarAMD on Aug 23 2016, 2:02 PM.

Download Raw Diff

Details

Reviewers

Commits

rGc2ff0eb69762: AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the scheduler
rL279991: AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the scheduler

Summary

The SILoadStoreOptimizer can now look ahead more then one instruction when
looking for instructions to merge, which greatly improves the number of
loads/stores that we are able to merge.

Moving the pass before scheduling avoids increasing register pressure after
the scheduler, so that the scheduler's register pressure estimates will be
more accurate. It also gives more consistent results, since it is no longer
affected by minor scheduling changes.

Diff Detail

Repository: rL LLVM

Event Timeline

• tstellarAMD updated this revision to Diff 69035.Aug 23 2016, 2:02 PM

• tstellarAMD retitled this revision from to AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the scheduler.

• tstellarAMD updated this object.

• tstellarAMD added a reviewer: arsenm.

• tstellarAMD added a subscriber: llvm-commits.

Herald added subscribers: kzhuravl, arsenm. · View Herald TranscriptAug 23 2016, 2:02 PM

arsenm added inline comments.Aug 23 2016, 2:45 PM

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
499 ↗	(On Diff #69035)	This isn't needed here since addMachineSSAOptimization isn't called with None anyway
547–552 ↗	(On Diff #69035)	The extra run should be removed now
lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
78 ↗	(On Diff #69035)	ArrayRef
84 ↗	(On Diff #69035)	ArrayRef
107 ↗	(On Diff #69035)	Does this need to be required? If it's not required is it available in the current pass pipeline? I think there's a subtarget hook we needed to enable to use this
198 ↗	(On Diff #69035)	Should be a SmallVector
226–249 ↗	(On Diff #69035)	It is not possible to have a sub register def in SSA so I don't think you need any of this

• tstellarAMD added inline comments.Aug 24 2016, 7:26 AM

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
547–552 ↗	(On Diff #69035)	The SIShrinkInstructionsPass depends on the being run. I don't think we can remove it until we fix the SIShrinkInstructionsPass.

Remove extra RegisterCoalescer run, and address other review comments.

• tstellarAMD added inline comments.Aug 24 2016, 6:46 PM

lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
107 ↗	(On Diff #69035)	This is required so we don't regress tests in llvm.memcpy.ll, The alias analysis passes are always run in TargetPassConfig::addIRPasses(). The subtarget hook looks like it is only used by the SelectionDAG passes.
226–249 ↗	(On Diff #69035)	I dropped the code that checks for writes, but the code that checks for reads doesn't have anything to do with sub-registers.

arsenm added inline comments.Aug 24 2016, 7:07 PM

lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
107 ↗	(On Diff #69194)	I think when I was looking at the hook it was for AA for scheduling
141–143 ↗	(On Diff #69194)	I think you can directly initialize the SmallVector with the range, SmallVector<> Foo(MI.defs())
149 ↗	(On Diff #69194)	SIInstrInfo
155 ↗	(On Diff #69194)	Can use ->mayLoadOrStore
216 ↗	(On Diff #69194)	ditto
364 ↗	(On Diff #69194)	Missing space
442 ↗	(On Diff #69194)	SmallVector

Address review comments.

LGTM

This revision is now accepted and ready to land.Aug 25 2016, 10:00 AM

Closed by commit rL279991: AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the scheduler (authored by tstellar). · Explain WhyAug 29 2016, 12:23 PM

This revision was automatically updated to reflect the committed changes.

foad mentioned this in D110238: [LiveIntervals] Fix repairOldRegInRange for simple def cases.Sep 22 2021, 6:05 AM

foad mentioned this in rG8229cb741253: [LiveIntervals] Fix repairOldRegInRange for simple def cases.Sep 23 2021, 9:16 AM

foad mentioned this in rG7863cc6c1c9e: [LiveIntervals] Fix repairOldRegInRange for simple def cases.Sep 24 2021, 3:57 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

AMDGPU/

AMDGPUTargetMachine.cpp

12 lines

SILoadStoreOptimizer.cpp

245 lines

test/

CodeGen/

AMDGPU/

ds_read2_offset_order.ll

4 lines

ds_write2.ll

8 lines

fceil64.ll

4 lines

llvm.amdgcn.rsq.clamp.ll

3 lines

load-local-i16.ll

54 lines

load-local-i32.ll

12 lines

local-memory.amdgcn.ll

3 lines

si-triv-disjoint-mem-access.ll

3 lines

store-v3i64.ll

3 lines

use-sgpr-multiple-times.ll

6 lines

Diff 69600

llvm/trunk/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 491 Lines • ▼ Show 20 Lines	void GCNPassConfig::addMachineSSAOptimization() {
// it), because it will eliminate extra copies making it easier to fold the		// it), because it will eliminate extra copies making it easier to fold the
// real source operand. We want to eliminate dead instructions after, so that		// real source operand. We want to eliminate dead instructions after, so that
// we see fewer uses of the copies. We then need to clean up the dead		// we see fewer uses of the copies. We then need to clean up the dead
// instructions leftover after the operands are folded as well.		// instructions leftover after the operands are folded as well.
//		//
// XXX - Can we get away without running DeadMachineInstructionElim again?		// XXX - Can we get away without running DeadMachineInstructionElim again?
addPass(&SIFoldOperandsID);		addPass(&SIFoldOperandsID);
addPass(&DeadMachineInstructionElimID);		addPass(&DeadMachineInstructionElimID);
		addPass(&SILoadStoreOptimizerID);
}		}

void GCNPassConfig::addIRPasses() {		void GCNPassConfig::addIRPasses() {
// TODO: May want to move later or split into an early and late one.		// TODO: May want to move later or split into an early and late one.
addPass(createAMDGPUCodeGenPreparePass(&getGCNTargetMachine()));		addPass(createAMDGPUCodeGenPreparePass(&getGCNTargetMachine()));

AMDGPUPassConfig::addIRPasses();		AMDGPUPassConfig::addIRPasses();
}		}
Show All 20 Lines
}		}

bool GCNPassConfig::addGlobalInstructionSelect() {		bool GCNPassConfig::addGlobalInstructionSelect() {
return false;		return false;
}		}
#endif		#endif

void GCNPassConfig::addPreRegAlloc() {		void GCNPassConfig::addPreRegAlloc() {
if (getOptLevel() > CodeGenOpt::None) {
// Don't do this with no optimizations since it throws away debug info by
// merging nonadjacent loads.

// This should be run after scheduling, but before register allocation. It
// also need extra copies to the address operand to be eliminated.

// FIXME: Move pre-RA and remove extra reg coalescer run.
insertPass(&MachineSchedulerID, &SILoadStoreOptimizerID);
insertPass(&MachineSchedulerID, &RegisterCoalescerID);
}

addPass(createSIShrinkInstructionsPass());		addPass(createSIShrinkInstructionsPass());
addPass(createSIWholeQuadModePass());		addPass(createSIWholeQuadModePass());
}		}

void GCNPassConfig::addFastRegAlloc(FunctionPass *RegAllocPass) {		void GCNPassConfig::addFastRegAlloc(FunctionPass *RegAllocPass) {
// FIXME: We have to disable the verifier here because of PHIElimination +		// FIXME: We have to disable the verifier here because of PHIElimination +
// TwoAddressInstructions disabling it.		// TwoAddressInstructions disabling it.
Show All 40 Lines

llvm/trunk/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines

namespace {		namespace {

class SILoadStoreOptimizer : public MachineFunctionPass {		class SILoadStoreOptimizer : public MachineFunctionPass {
private:		private:
const SIInstrInfo *TII;		const SIInstrInfo *TII;
const SIRegisterInfo *TRI;		const SIRegisterInfo *TRI;
MachineRegisterInfo *MRI;		MachineRegisterInfo *MRI;
LiveIntervals *LIS;		AliasAnalysis *AA;

static bool offsetsCanBeCombined(unsigned Offset0,		static bool offsetsCanBeCombined(unsigned Offset0,
unsigned Offset1,		unsigned Offset1,
unsigned EltSize);		unsigned EltSize);

MachineBasicBlock::iterator findMatchingDSInst(MachineBasicBlock::iterator I,		MachineBasicBlock::iterator findMatchingDSInst(
unsigned EltSize);		MachineBasicBlock::iterator I,
		unsigned EltSize,
		SmallVectorImpl<MachineInstr*> &InstsToMove);

MachineBasicBlock::iterator mergeRead2Pair(		MachineBasicBlock::iterator mergeRead2Pair(
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
MachineBasicBlock::iterator Paired,		MachineBasicBlock::iterator Paired,
unsigned EltSize);		unsigned EltSize,
		ArrayRef<MachineInstr*> InstsToMove);

MachineBasicBlock::iterator mergeWrite2Pair(		MachineBasicBlock::iterator mergeWrite2Pair(
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
MachineBasicBlock::iterator Paired,		MachineBasicBlock::iterator Paired,
unsigned EltSize);		unsigned EltSize,
		ArrayRef<MachineInstr*> InstsToMove);

public:		public:
static char ID;		static char ID;

SILoadStoreOptimizer()		SILoadStoreOptimizer()
: MachineFunctionPass(ID), TII(nullptr), TRI(nullptr), MRI(nullptr),		: MachineFunctionPass(ID), TII(nullptr), TRI(nullptr), MRI(nullptr),
LIS(nullptr) {}		AA(nullptr) {}

SILoadStoreOptimizer(const TargetMachine &TM_) : MachineFunctionPass(ID) {		SILoadStoreOptimizer(const TargetMachine &TM_) : MachineFunctionPass(ID) {
initializeSILoadStoreOptimizerPass(*PassRegistry::getPassRegistry());		initializeSILoadStoreOptimizerPass(*PassRegistry::getPassRegistry());
}		}

bool optimizeBlock(MachineBasicBlock &MBB);		bool optimizeBlock(MachineBasicBlock &MBB);

bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;

const char *getPassName() const override {		const char *getPassName() const override {
return "SI Load / Store Optimizer";		return "SI Load / Store Optimizer";
}		}

MachineFunctionProperties getRequiredProperties() const override {
return MachineFunctionProperties().set(
MachineFunctionProperties::Property::NoPHIs);
}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
AU.addPreserved<SlotIndexes>();		AU.addRequired<AAResultsWrapperPass>();
AU.addPreserved<LiveIntervals>();
AU.addPreserved<LiveVariables>();
AU.addRequired<LiveIntervals>();

MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}
};		};

} // End anonymous namespace.		} // End anonymous namespace.

INITIALIZE_PASS_BEGIN(SILoadStoreOptimizer, DEBUG_TYPE,		INITIALIZE_PASS_BEGIN(SILoadStoreOptimizer, DEBUG_TYPE,
"SI Load / Store Optimizer", false, false)		"SI Load / Store Optimizer", false, false)
INITIALIZE_PASS_DEPENDENCY(LiveIntervals)		INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
INITIALIZE_PASS_DEPENDENCY(LiveVariables)
INITIALIZE_PASS_DEPENDENCY(SlotIndexes)
INITIALIZE_PASS_END(SILoadStoreOptimizer, DEBUG_TYPE,		INITIALIZE_PASS_END(SILoadStoreOptimizer, DEBUG_TYPE,
"SI Load / Store Optimizer", false, false)		"SI Load / Store Optimizer", false, false)

char SILoadStoreOptimizer::ID = 0;		char SILoadStoreOptimizer::ID = 0;

char &llvm::SILoadStoreOptimizerID = SILoadStoreOptimizer::ID;		char &llvm::SILoadStoreOptimizerID = SILoadStoreOptimizer::ID;

FunctionPass *llvm::createSILoadStoreOptimizerPass(TargetMachine &TM) {		FunctionPass *llvm::createSILoadStoreOptimizerPass(TargetMachine &TM) {
return new SILoadStoreOptimizer(TM);		return new SILoadStoreOptimizer(TM);
}		}

		static void moveInstsAfter(MachineBasicBlock::iterator I,
		ArrayRef<MachineInstr*> InstsToMove) {
		MachineBasicBlock *MBB = I->getParent();
		++I;
		for (MachineInstr *MI : InstsToMove) {
		MI->removeFromParent();
		MBB->insert(I, MI);
		}
		}

		static void addDefsToList(const MachineInstr &MI,
		SmallVectorImpl<const MachineOperand *> &Defs) {
		for (const MachineOperand &Def : MI.defs()) {
		Defs.push_back(&Def);
		}
		}

		static bool
		canMoveInstsAcrossMemOp(MachineInstr &MemOp,
		ArrayRef<MachineInstr*> InstsToMove,
		const SIInstrInfo *TII,
		AliasAnalysis *AA) {

		assert(MemOp.mayLoadOrStore());

		for (MachineInstr *InstToMove : InstsToMove) {
		if (!InstToMove->mayLoadOrStore())
		continue;
		if (!TII->areMemAccessesTriviallyDisjoint(MemOp, *InstToMove, AA))
		return false;
		}
		return true;
		}

bool SILoadStoreOptimizer::offsetsCanBeCombined(unsigned Offset0,		bool SILoadStoreOptimizer::offsetsCanBeCombined(unsigned Offset0,
unsigned Offset1,		unsigned Offset1,
unsigned Size) {		unsigned Size) {
// XXX - Would the same offset be OK? Is there any reason this would happen or		// XXX - Would the same offset be OK? Is there any reason this would happen or
// be useful?		// be useful?
if (Offset0 == Offset1)		if (Offset0 == Offset1)
return false;		return false;

Show All 13 Lines	bool SILoadStoreOptimizer::offsetsCanBeCombined(unsigned Offset0,
if ((EltOffset0 % 64 != 0) \|\| (EltOffset1 % 64) != 0)		if ((EltOffset0 % 64 != 0) \|\| (EltOffset1 % 64) != 0)
return false;		return false;

return isUInt<8>(EltOffset0 / 64) && isUInt<8>(EltOffset1 / 64);		return isUInt<8>(EltOffset0 / 64) && isUInt<8>(EltOffset1 / 64);
}		}

MachineBasicBlock::iterator		MachineBasicBlock::iterator
SILoadStoreOptimizer::findMatchingDSInst(MachineBasicBlock::iterator I,		SILoadStoreOptimizer::findMatchingDSInst(MachineBasicBlock::iterator I,
unsigned EltSize){		unsigned EltSize,
		SmallVectorImpl<MachineInstr*> &InstsToMove) {
MachineBasicBlock::iterator E = I->getParent()->end();		MachineBasicBlock::iterator E = I->getParent()->end();
MachineBasicBlock &MBB = *I->getParent();
MachineBasicBlock::iterator MBBI = I;		MachineBasicBlock::iterator MBBI = I;
++MBBI;		++MBBI;

if (MBBI == MBB.end() \|\| MBBI->getOpcode() != I->getOpcode())		SmallVector<const MachineOperand *, 8> DefsToMove;
		addDefsToList(*I, DefsToMove);

		for ( ; MBBI != E; ++MBBI) {

		if (MBBI->getOpcode() != I->getOpcode()) {

		// This is not a matching DS instruction, but we can keep looking as
		// long as one of these conditions are met:
		// 1. It is safe to move I down past MBBI.
		// 2. It is safe to move MBBI down past the instruction that I will
		// be merged into.

		if (MBBI->hasUnmodeledSideEffects())
		// We can't re-order this instruction with respect to other memory
		// opeations, so we fail both conditions mentioned above.
return E;		return E;

		if (MBBI->mayLoadOrStore() &&
		!TII->areMemAccessesTriviallyDisjoint(I, MBBI, AA)) {
		// We fail condition #1, but we may still be able to satisfy condition
		// #2. Add this instruction to the move list and then we will check
		// if condition #2 holds once we have selected the matching instruction.
		InstsToMove.push_back(&*MBBI);
		addDefsToList(*MBBI, DefsToMove);
		continue;
		}

		// When we match I with another DS instruction we will be moving I down
		// to the location of the matched instruction any uses of I will need to
		// be moved down as well.
		for (const MachineOperand *Def : DefsToMove) {
		bool ReadDef = MBBI->readsVirtualRegister(Def->getReg());
		// If ReadDef is true, then there is a use of Def between I
		// and the instruction that I will potentially be merged with. We
		// will need to move this instruction after the merged instructions.
		if (ReadDef) {
		InstsToMove.push_back(&*MBBI);
		addDefsToList(*MBBI, DefsToMove);
		break;
		}
		}
		continue;
		}

// Don't merge volatiles.		// Don't merge volatiles.
if (MBBI->hasOrderedMemoryRef())		if (MBBI->hasOrderedMemoryRef())
return E;		return E;

int AddrIdx = AMDGPU::getNamedOperandIdx(I->getOpcode(), AMDGPU::OpName::addr);		int AddrIdx = AMDGPU::getNamedOperandIdx(I->getOpcode(), AMDGPU::OpName::addr);
const MachineOperand &AddrReg0 = I->getOperand(AddrIdx);		const MachineOperand &AddrReg0 = I->getOperand(AddrIdx);
const MachineOperand &AddrReg1 = MBBI->getOperand(AddrIdx);		const MachineOperand &AddrReg1 = MBBI->getOperand(AddrIdx);

// Check same base pointer. Be careful of subregisters, which can occur with		// Check same base pointer. Be careful of subregisters, which can occur with
// vectors of pointers.		// vectors of pointers.
if (AddrReg0.getReg() == AddrReg1.getReg() &&		if (AddrReg0.getReg() == AddrReg1.getReg() &&
AddrReg0.getSubReg() == AddrReg1.getSubReg()) {		AddrReg0.getSubReg() == AddrReg1.getSubReg()) {
int OffsetIdx = AMDGPU::getNamedOperandIdx(I->getOpcode(),		int OffsetIdx = AMDGPU::getNamedOperandIdx(I->getOpcode(),
AMDGPU::OpName::offset);		AMDGPU::OpName::offset);
unsigned Offset0 = I->getOperand(OffsetIdx).getImm() & 0xffff;		unsigned Offset0 = I->getOperand(OffsetIdx).getImm() & 0xffff;
unsigned Offset1 = MBBI->getOperand(OffsetIdx).getImm() & 0xffff;		unsigned Offset1 = MBBI->getOperand(OffsetIdx).getImm() & 0xffff;

// Check both offsets fit in the reduced range.		// Check both offsets fit in the reduced range.
if (offsetsCanBeCombined(Offset0, Offset1, EltSize))		// We also need to go through the list of instructions that we plan to
		// move and make sure they are all safe to move down past the merged
		// instruction.
		if (offsetsCanBeCombined(Offset0, Offset1, EltSize) &&
		canMoveInstsAcrossMemOp(*MBBI, InstsToMove, TII, AA))
return MBBI;		return MBBI;
}		}

		// We've found a load/store that we couldn't merge for some reason.
		// We could potentially keep looking, but we'd need to make sure that
		// it was safe to move I and also all the instruction in InstsToMove
		// down past this instruction.
		// FIXME: This is too conservative.
		break;
		}
return E;		return E;
}		}

MachineBasicBlock::iterator SILoadStoreOptimizer::mergeRead2Pair(		MachineBasicBlock::iterator SILoadStoreOptimizer::mergeRead2Pair(
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
MachineBasicBlock::iterator Paired,		MachineBasicBlock::iterator Paired,
unsigned EltSize) {		unsigned EltSize,
		ArrayRef<MachineInstr*> InstsToMove) {
MachineBasicBlock *MBB = I->getParent();		MachineBasicBlock *MBB = I->getParent();

// Be careful, since the addresses could be subregisters themselves in weird		// Be careful, since the addresses could be subregisters themselves in weird
// cases, like vectors of pointers.		// cases, like vectors of pointers.
const MachineOperand AddrReg = TII->getNamedOperand(I, AMDGPU::OpName::addr);		const MachineOperand AddrReg = TII->getNamedOperand(I, AMDGPU::OpName::addr);

const MachineOperand Dest0 = TII->getNamedOperand(I, AMDGPU::OpName::vdst);		const MachineOperand Dest0 = TII->getNamedOperand(I, AMDGPU::OpName::vdst);
const MachineOperand Dest1 = TII->getNamedOperand(Paired, AMDGPU::OpName::vdst);		const MachineOperand Dest1 = TII->getNamedOperand(Paired, AMDGPU::OpName::vdst);
Show All 32 Lines	MachineBasicBlock::iterator SILoadStoreOptimizer::mergeRead2Pair(
const MCInstrDesc &Read2Desc = TII->get(Opc);		const MCInstrDesc &Read2Desc = TII->get(Opc);

const TargetRegisterClass *SuperRC		const TargetRegisterClass *SuperRC
= (EltSize == 4) ? &AMDGPU::VReg_64RegClass : &AMDGPU::VReg_128RegClass;		= (EltSize == 4) ? &AMDGPU::VReg_64RegClass : &AMDGPU::VReg_128RegClass;
unsigned DestReg = MRI->createVirtualRegister(SuperRC);		unsigned DestReg = MRI->createVirtualRegister(SuperRC);

DebugLoc DL = I->getDebugLoc();		DebugLoc DL = I->getDebugLoc();
MachineInstrBuilder Read2		MachineInstrBuilder Read2
= BuildMI(*MBB, I, DL, Read2Desc, DestReg)		= BuildMI(*MBB, Paired, DL, Read2Desc, DestReg)
.addOperand(*AddrReg) // addr		.addOperand(*AddrReg) // addr
.addImm(NewOffset0) // offset0		.addImm(NewOffset0) // offset0
.addImm(NewOffset1) // offset1		.addImm(NewOffset1) // offset1
.addImm(0) // gds		.addImm(0) // gds
.addMemOperand(*I->memoperands_begin())		.addMemOperand(*I->memoperands_begin())
.addMemOperand(*Paired->memoperands_begin());		.addMemOperand(*Paired->memoperands_begin());

const MCInstrDesc &CopyDesc = TII->get(TargetOpcode::COPY);		const MCInstrDesc &CopyDesc = TII->get(TargetOpcode::COPY);

// Copy to the old destination registers.		// Copy to the old destination registers.
MachineInstr Copy0 = BuildMI(MBB, I, DL, CopyDesc)		BuildMI(*MBB, Paired, DL, CopyDesc)
.addOperand(*Dest0) // Copy to same destination including flags and sub reg.		.addOperand(*Dest0) // Copy to same destination including flags and sub reg.
.addReg(DestReg, 0, SubRegIdx0);		.addReg(DestReg, 0, SubRegIdx0);
MachineInstr Copy1 = BuildMI(MBB, I, DL, CopyDesc)		MachineInstr Copy1 = BuildMI(MBB, Paired, DL, CopyDesc)
.addOperand(*Dest1)		.addOperand(*Dest1)
.addReg(DestReg, RegState::Kill, SubRegIdx1);		.addReg(DestReg, RegState::Kill, SubRegIdx1);

LIS->InsertMachineInstrInMaps(*Read2);		moveInstsAfter(Copy1, InstsToMove);

// repairLiveintervalsInRange() doesn't handle physical register, so we have
// to update the M0 range manually.
SlotIndex PairedIndex = LIS->getInstructionIndex(*Paired);
LiveRange &M0Range = LIS->getRegUnit(*MCRegUnitIterator(AMDGPU::M0, TRI));
LiveRange::Segment *M0Segment = M0Range.getSegmentContaining(PairedIndex);
bool UpdateM0Range = M0Segment->end == PairedIndex.getRegSlot();

// The new write to the original destination register is now the copy. Steal
// the old SlotIndex.
LIS->ReplaceMachineInstrInMaps(I, Copy0);
LIS->ReplaceMachineInstrInMaps(Paired, Copy1);

		MachineBasicBlock::iterator Next = std::next(I);
I->eraseFromParent();		I->eraseFromParent();
Paired->eraseFromParent();		Paired->eraseFromParent();

LiveInterval &AddrRegLI = LIS->getInterval(AddrReg->getReg());
LIS->shrinkToUses(&AddrRegLI);

LIS->createAndComputeVirtRegInterval(DestReg);

if (UpdateM0Range) {
SlotIndex Read2Index = LIS->getInstructionIndex(*Read2);
M0Segment->end = Read2Index.getRegSlot();
}

DEBUG(dbgs() << "Inserted read2: " << *Read2 << '\n');		DEBUG(dbgs() << "Inserted read2: " << *Read2 << '\n');
return Read2.getInstr();		return Next;
}		}

MachineBasicBlock::iterator SILoadStoreOptimizer::mergeWrite2Pair(		MachineBasicBlock::iterator SILoadStoreOptimizer::mergeWrite2Pair(
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
MachineBasicBlock::iterator Paired,		MachineBasicBlock::iterator Paired,
unsigned EltSize) {		unsigned EltSize,
		ArrayRef<MachineInstr*> InstsToMove) {
MachineBasicBlock *MBB = I->getParent();		MachineBasicBlock *MBB = I->getParent();

// Be sure to use .addOperand(), and not .addReg() with these. We want to be		// Be sure to use .addOperand(), and not .addReg() with these. We want to be
// sure we preserve the subregister index and any register flags set on them.		// sure we preserve the subregister index and any register flags set on them.
const MachineOperand Addr = TII->getNamedOperand(I, AMDGPU::OpName::addr);		const MachineOperand Addr = TII->getNamedOperand(I, AMDGPU::OpName::addr);
const MachineOperand Data0 = TII->getNamedOperand(I, AMDGPU::OpName::data0);		const MachineOperand Data0 = TII->getNamedOperand(I, AMDGPU::OpName::data0);
const MachineOperand *Data1		const MachineOperand *Data1
= TII->getNamedOperand(*Paired, AMDGPU::OpName::data0);		= TII->getNamedOperand(*Paired, AMDGPU::OpName::data0);
Show All 25 Lines	MachineBasicBlock::iterator SILoadStoreOptimizer::mergeWrite2Pair(

assert((isUInt<8>(NewOffset0) && isUInt<8>(NewOffset1)) &&		assert((isUInt<8>(NewOffset0) && isUInt<8>(NewOffset1)) &&
(NewOffset0 != NewOffset1) &&		(NewOffset0 != NewOffset1) &&
"Computed offset doesn't fit");		"Computed offset doesn't fit");

const MCInstrDesc &Write2Desc = TII->get(Opc);		const MCInstrDesc &Write2Desc = TII->get(Opc);
DebugLoc DL = I->getDebugLoc();		DebugLoc DL = I->getDebugLoc();

// repairLiveintervalsInRange() doesn't handle physical register, so we have
// to update the M0 range manually.
SlotIndex PairedIndex = LIS->getInstructionIndex(*Paired);
LiveRange &M0Range = LIS->getRegUnit(*MCRegUnitIterator(AMDGPU::M0, TRI));
LiveRange::Segment *M0Segment = M0Range.getSegmentContaining(PairedIndex);
bool UpdateM0Range = M0Segment->end == PairedIndex.getRegSlot();

MachineInstrBuilder Write2		MachineInstrBuilder Write2
= BuildMI(*MBB, I, DL, Write2Desc)		= BuildMI(*MBB, Paired, DL, Write2Desc)
.addOperand(*Addr) // addr		.addOperand(*Addr) // addr
.addOperand(*Data0) // data0		.addOperand(*Data0) // data0
.addOperand(*Data1) // data1		.addOperand(*Data1) // data1
.addImm(NewOffset0) // offset0		.addImm(NewOffset0) // offset0
.addImm(NewOffset1) // offset1		.addImm(NewOffset1) // offset1
.addImm(0) // gds		.addImm(0) // gds
.addMemOperand(*I->memoperands_begin())		.addMemOperand(*I->memoperands_begin())
.addMemOperand(*Paired->memoperands_begin());		.addMemOperand(*Paired->memoperands_begin());

// XXX - How do we express subregisters here?		moveInstsAfter(Write2, InstsToMove);
unsigned OrigRegs[] = { Data0->getReg(), Data1->getReg(), Addr->getReg() };

LIS->RemoveMachineInstrFromMaps(*I);		MachineBasicBlock::iterator Next = std::next(I);
LIS->RemoveMachineInstrFromMaps(*Paired);
I->eraseFromParent();		I->eraseFromParent();
Paired->eraseFromParent();		Paired->eraseFromParent();

// This doesn't handle physical registers like M0
LIS->repairIntervalsInRange(MBB, Write2, Write2, OrigRegs);

if (UpdateM0Range) {
SlotIndex Write2Index = LIS->getInstructionIndex(*Write2);
M0Segment->end = Write2Index.getRegSlot();
}

DEBUG(dbgs() << "Inserted write2 inst: " << *Write2 << '\n');		DEBUG(dbgs() << "Inserted write2 inst: " << *Write2 << '\n');
return Write2.getInstr();		return Next;
}		}

// Scan through looking for adjacent LDS operations with constant offsets from		// Scan through looking for adjacent LDS operations with constant offsets from
// the same base register. We rely on the scheduler to do the hard work of		// the same base register. We rely on the scheduler to do the hard work of
// clustering nearby loads, and assume these are all adjacent.		// clustering nearby loads, and assume these are all adjacent.
bool SILoadStoreOptimizer::optimizeBlock(MachineBasicBlock &MBB) {		bool SILoadStoreOptimizer::optimizeBlock(MachineBasicBlock &MBB) {
bool Modified = false;		bool Modified = false;

for (MachineBasicBlock::iterator I = MBB.begin(), E = MBB.end(); I != E;) {		for (MachineBasicBlock::iterator I = MBB.begin(), E = MBB.end(); I != E;) {
MachineInstr &MI = *I;		MachineInstr &MI = *I;

// Don't combine if volatile.		// Don't combine if volatile.
if (MI.hasOrderedMemoryRef()) {		if (MI.hasOrderedMemoryRef()) {
++I;		++I;
continue;		continue;
}		}

		SmallVector<MachineInstr*, 8> InstsToMove;
unsigned Opc = MI.getOpcode();		unsigned Opc = MI.getOpcode();
if (Opc == AMDGPU::DS_READ_B32 \|\| Opc == AMDGPU::DS_READ_B64) {		if (Opc == AMDGPU::DS_READ_B32 \|\| Opc == AMDGPU::DS_READ_B64) {
unsigned Size = (Opc == AMDGPU::DS_READ_B64) ? 8 : 4;		unsigned Size = (Opc == AMDGPU::DS_READ_B64) ? 8 : 4;
MachineBasicBlock::iterator Match = findMatchingDSInst(I, Size);		MachineBasicBlock::iterator Match = findMatchingDSInst(I, Size,
		InstsToMove);
if (Match != E) {		if (Match != E) {
Modified = true;		Modified = true;
I = mergeRead2Pair(I, Match, Size);		I = mergeRead2Pair(I, Match, Size, InstsToMove);
} else {		} else {
++I;		++I;
}		}

continue;		continue;
} else if (Opc == AMDGPU::DS_WRITE_B32 \|\| Opc == AMDGPU::DS_WRITE_B64) {		} else if (Opc == AMDGPU::DS_WRITE_B32 \|\| Opc == AMDGPU::DS_WRITE_B64) {
unsigned Size = (Opc == AMDGPU::DS_WRITE_B64) ? 8 : 4;		unsigned Size = (Opc == AMDGPU::DS_WRITE_B64) ? 8 : 4;
MachineBasicBlock::iterator Match = findMatchingDSInst(I, Size);		MachineBasicBlock::iterator Match = findMatchingDSInst(I, Size,
		InstsToMove);
if (Match != E) {		if (Match != E) {
Modified = true;		Modified = true;
I = mergeWrite2Pair(I, Match, Size);		I = mergeWrite2Pair(I, Match, Size, InstsToMove);
} else {		} else {
++I;		++I;
}		}

continue;		continue;
}		}

++I;		++I;
Show All 9 Lines	bool SILoadStoreOptimizer::runOnMachineFunction(MachineFunction &MF) {
const SISubtarget &STM = MF.getSubtarget<SISubtarget>();		const SISubtarget &STM = MF.getSubtarget<SISubtarget>();
if (!STM.loadStoreOptEnabled())		if (!STM.loadStoreOptEnabled())
return false;		return false;

TII = STM.getInstrInfo();		TII = STM.getInstrInfo();
TRI = &TII->getRegisterInfo();		TRI = &TII->getRegisterInfo();

MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();
		AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
LIS = &getAnalysis<LiveIntervals>();

DEBUG(dbgs() << "Running SILoadStoreOptimizer\n");		DEBUG(dbgs() << "Running SILoadStoreOptimizer\n");

bool Modified = false;		bool Modified = false;

for (MachineBasicBlock &MBB : MF)		for (MachineBasicBlock &MBB : MF)
Modified \|= optimizeBlock(MBB);		Modified \|= optimizeBlock(MBB);

return Modified;		return Modified;
}		}

llvm/trunk/test/CodeGen/AMDGPU/ds_read2_offset_order.ll

	; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs -mattr=+load-store-opt < %s \| FileCheck -strict-whitespace -check-prefix=SI %s			; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs -mattr=+load-store-opt < %s \| FileCheck -strict-whitespace -check-prefix=SI %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs -mattr=+load-store-opt < %s \| FileCheck -strict-whitespace -check-prefix=SI %s			; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs -mattr=+load-store-opt < %s \| FileCheck -strict-whitespace -check-prefix=SI %s


	@lds = addrspace(3) global [512 x float] undef, align 4			@lds = addrspace(3) global [512 x float] undef, align 4

	; offset0 is larger than offset1			; offset0 is larger than offset1

	; SI-LABEL: {{^}}offset_order:			; SI-LABEL: {{^}}offset_order:

	; SI: ds_read2_b32 v[{{[0-9]+}}:{{[0-9]+}}], v{{[0-9]+}} offset0:2 offset1:3			; SI: ds_read2_b32 v[{{[0-9]+}}:{{[0-9]+}}], v{{[0-9]+}} offset0:2 offset1:3
	; SI: ds_read2_b32 v[{{[0-9]+}}:{{[0-9]+}}], v{{[0-9]+}} offset0:12 offset1:14			; SI-DAG: ds_read2_b32 v[{{[0-9]+}}:{{[0-9]+}}], v{{[0-9]+}} offset0:12 offset1:14
	; SI: ds_read_b32 v{{[0-9]+}}, v{{[0-9]+}} offset:44			; SI-DAG: ds_read_b32 v{{[0-9]+}}, v{{[0-9]+}} offset:44

	define void @offset_order(float addrspace(1)* %out) {			define void @offset_order(float addrspace(1)* %out) {
	entry:			entry:
	%ptr0 = getelementptr inbounds [512 x float], [512 x float] addrspace(3)* @lds, i32 0, i32 0			%ptr0 = getelementptr inbounds [512 x float], [512 x float] addrspace(3)* @lds, i32 0, i32 0
	%val0 = load float, float addrspace(3)* %ptr0			%val0 = load float, float addrspace(3)* %ptr0

	%ptr1 = getelementptr inbounds [512 x float], [512 x float] addrspace(3)* @lds, i32 0, i32 256			%ptr1 = getelementptr inbounds [512 x float], [512 x float] addrspace(3)* @lds, i32 0, i32 256
	%val1 = load float, float addrspace(3)* %ptr1			%val1 = load float, float addrspace(3)* %ptr1
	Show All 24 Lines

llvm/trunk/test/CodeGen/AMDGPU/ds_write2.ll

Show First 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	define void @simple_write2_two_val_too_far_f32(float addrspace(1)* %C, float addrspace(1)* %in0, float addrspace(1)* %in1) #0 {
store float %val0, float addrspace(3)* %arrayidx0, align 4		store float %val0, float addrspace(3)* %arrayidx0, align 4
%add.x = add nsw i32 %x.i, 257		%add.x = add nsw i32 %x.i, 257
%arrayidx1 = getelementptr inbounds [512 x float], [512 x float] addrspace(3)* @lds, i32 0, i32 %add.x		%arrayidx1 = getelementptr inbounds [512 x float], [512 x float] addrspace(3)* @lds, i32 0, i32 %add.x
store float %val1, float addrspace(3)* %arrayidx1, align 4		store float %val1, float addrspace(3)* %arrayidx1, align 4
ret void		ret void
}		}

; SI-LABEL: @simple_write2_two_val_f32_x2		; SI-LABEL: @simple_write2_two_val_f32_x2
; SI: ds_write2_b32 [[BASEADDR:v[0-9]+]], [[VAL0:v[0-9]+]], [[VAL0]] offset1:11		; SI: ds_write2_b32 [[BASEADDR:v[0-9]+]], [[VAL0:v[0-9]+]], [[VAL1:v[0-9]+]] offset1:8
; SI: ds_write2_b32 [[BASEADDR:v[0-9]+]], [[VAL1:v[0-9]+]], [[VAL1]] offset0:8 offset1:27		; SI: ds_write2_b32 [[BASEADDR:v[0-9]+]], [[VAL0]], [[VAL1]] offset0:11 offset1:27
; SI: s_endpgm		; SI: s_endpgm
define void @simple_write2_two_val_f32_x2(float addrspace(1)* %C, float addrspace(1)* %in0, float addrspace(1)* %in1) #0 {		define void @simple_write2_two_val_f32_x2(float addrspace(1)* %C, float addrspace(1)* %in0, float addrspace(1)* %in1) #0 {
%tid.x = tail call i32 @llvm.amdgcn.workitem.id.x() #1		%tid.x = tail call i32 @llvm.amdgcn.workitem.id.x() #1
%in0.gep = getelementptr float, float addrspace(1)* %in0, i32 %tid.x		%in0.gep = getelementptr float, float addrspace(1)* %in0, i32 %tid.x
%in1.gep = getelementptr float, float addrspace(1)* %in1, i32 %tid.x		%in1.gep = getelementptr float, float addrspace(1)* %in1, i32 %tid.x
%val0 = load float, float addrspace(1)* %in0.gep, align 4		%val0 = load float, float addrspace(1)* %in0.gep, align 4
%val1 = load float, float addrspace(1)* %in1.gep, align 4		%val1 = load float, float addrspace(1)* %in1.gep, align 4

Show All 12 Lines	define void @simple_write2_two_val_f32_x2(float addrspace(1)* %C, float addrspace(1)* %in0, float addrspace(1)* %in1) #0 {
%idx.3 = add nsw i32 %tid.x, 27		%idx.3 = add nsw i32 %tid.x, 27
%arrayidx3 = getelementptr inbounds [512 x float], [512 x float] addrspace(3)* @lds, i32 0, i32 %idx.3		%arrayidx3 = getelementptr inbounds [512 x float], [512 x float] addrspace(3)* @lds, i32 0, i32 %idx.3
store float %val1, float addrspace(3)* %arrayidx3, align 4		store float %val1, float addrspace(3)* %arrayidx3, align 4

ret void		ret void
}		}

; SI-LABEL: @simple_write2_two_val_f32_x2_nonzero_base		; SI-LABEL: @simple_write2_two_val_f32_x2_nonzero_base
; SI: ds_write2_b32 [[BASEADDR:v[0-9]+]], [[VAL0:v[0-9]+]], [[VAL0]] offset0:3 offset1:11		; SI: ds_write2_b32 [[BASEADDR:v[0-9]+]], [[VAL0:v[0-9]+]], [[VAL1:v[0-9]+]] offset0:3 offset1:8
; SI: ds_write2_b32 [[BASEADDR:v[0-9]+]], [[VAL1:v[0-9]+]], [[VAL1]] offset0:8 offset1:27		; SI: ds_write2_b32 [[BASEADDR:v[0-9]+]], [[VAL0]], [[VAL1]] offset0:11 offset1:27
; SI: s_endpgm		; SI: s_endpgm
define void @simple_write2_two_val_f32_x2_nonzero_base(float addrspace(1)* %C, float addrspace(1)* %in0, float addrspace(1)* %in1) #0 {		define void @simple_write2_two_val_f32_x2_nonzero_base(float addrspace(1)* %C, float addrspace(1)* %in0, float addrspace(1)* %in1) #0 {
%tid.x = tail call i32 @llvm.amdgcn.workitem.id.x() #1		%tid.x = tail call i32 @llvm.amdgcn.workitem.id.x() #1
%in0.gep = getelementptr float, float addrspace(1)* %in0, i32 %tid.x		%in0.gep = getelementptr float, float addrspace(1)* %in0, i32 %tid.x
%in1.gep = getelementptr float, float addrspace(1)* %in1, i32 %tid.x		%in1.gep = getelementptr float, float addrspace(1)* %in1, i32 %tid.x
%val0 = load float, float addrspace(1)* %in0.gep, align 4		%val0 = load float, float addrspace(1)* %in0.gep, align 4
%val1 = load float, float addrspace(1)* %in1.gep, align 4		%val1 = load float, float addrspace(1)* %in1.gep, align 4

▲ Show 20 Lines • Show All 215 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/fceil64.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs < %s \| FileCheck -check-prefix=CI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs < %s \| FileCheck -check-prefix=CI -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=CI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=CI -check-prefix=FUNC %s

	declare double @llvm.ceil.f64(double) nounwind readnone			declare double @llvm.ceil.f64(double) nounwind readnone
	declare <2 x double> @llvm.ceil.v2f64(<2 x double>) nounwind readnone			declare <2 x double> @llvm.ceil.v2f64(<2 x double>) nounwind readnone
	declare <3 x double> @llvm.ceil.v3f64(<3 x double>) nounwind readnone			declare <3 x double> @llvm.ceil.v3f64(<3 x double>) nounwind readnone
	declare <4 x double> @llvm.ceil.v4f64(<4 x double>) nounwind readnone			declare <4 x double> @llvm.ceil.v4f64(<4 x double>) nounwind readnone
	declare <8 x double> @llvm.ceil.v8f64(<8 x double>) nounwind readnone			declare <8 x double> @llvm.ceil.v8f64(<8 x double>) nounwind readnone
	declare <16 x double> @llvm.ceil.v16f64(<16 x double>) nounwind readnone			declare <16 x double> @llvm.ceil.v16f64(<16 x double>) nounwind readnone

	; FUNC-LABEL: {{^}}fceil_f64:			; FUNC-LABEL: {{^}}fceil_f64:
	; CI: v_ceil_f64_e32			; CI: v_ceil_f64_e32
	; SI: s_bfe_u32 [[SEXP:s[0-9]+]], {{s[0-9]+}}, 0xb0014			; SI: s_bfe_u32 [[SEXP:s[0-9]+]], {{s[0-9]+}}, 0xb0014
	; SI-DAG: s_and_b32 s{{[0-9]+}}, s{{[0-9]+}}, 0x80000000			; SI-DAG: s_and_b32 s{{[0-9]+}}, s{{[0-9]+}}, 0x80000000
	; SI-DAG: s_add_i32 [[SEXP1:s[0-9]+]], [[SEXP]], 0xfffffc01			; SI-DAG: s_addk_i32 [[SEXP]], 0xfc01
	; SI-DAG: s_lshr_b64 s[{{[0-9]+:[0-9]+}}], s[{{[0-9]+:[0-9]+}}], [[SEXP1]]			; SI-DAG: s_lshr_b64 s[{{[0-9]+:[0-9]+}}], s[{{[0-9]+:[0-9]+}}], [[SEXP]]
	; SI-DAG: s_not_b64			; SI-DAG: s_not_b64
	; SI-DAG: s_and_b64			; SI-DAG: s_and_b64
	; SI-DAG: cmp_gt_i32			; SI-DAG: cmp_gt_i32
	; SI-DAG: cndmask_b32			; SI-DAG: cndmask_b32
	; SI-DAG: cndmask_b32			; SI-DAG: cndmask_b32
	; SI-DAG: cmp_lt_i32			; SI-DAG: cmp_lt_i32
	; SI-DAG: cndmask_b32			; SI-DAG: cndmask_b32
	; SI-DAG: cndmask_b32			; SI-DAG: cndmask_b32
	▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.rsq.clamp.ll

Show All 19 Lines	define void @rsq_clamp_f32(float addrspace(1)* %out, float %src) #0 {
ret void		ret void
}		}


; FUNC-LABEL: {{^}}rsq_clamp_f64:		; FUNC-LABEL: {{^}}rsq_clamp_f64:
; SI: v_rsq_clamp_f64_e32		; SI: v_rsq_clamp_f64_e32

; TODO: this constant should be folded:		; TODO: this constant should be folded:
; VI-DAG: s_mov_b32 s[[LOW1:[0-9+]]], -1		; VI-DAG: s_mov_b32 [[NEG1:s[0-9+]]], -1
		; VI-DAG: s_mov_b32 s[[LOW1:[0-9+]]], [[NEG1]]
; VI-DAG: s_mov_b32 s[[HIGH1:[0-9+]]], 0x7fefffff		; VI-DAG: s_mov_b32 s[[HIGH1:[0-9+]]], 0x7fefffff
; VI-DAG: s_mov_b32 s[[HIGH2:[0-9+]]], 0xffefffff		; VI-DAG: s_mov_b32 s[[HIGH2:[0-9+]]], 0xffefffff
; VI-DAG: v_rsq_f64_e32 [[RSQ:v\[[0-9]+:[0-9]+\]]], s[{{[0-9]+:[0-9]+}}		; VI-DAG: v_rsq_f64_e32 [[RSQ:v\[[0-9]+:[0-9]+\]]], s[{{[0-9]+:[0-9]+}}
; VI-DAG: v_min_f64 v[0:1], [[RSQ]], s{{\[}}[[LOW1]]:[[HIGH1]]]		; VI-DAG: v_min_f64 v[0:1], [[RSQ]], s{{\[}}[[LOW1]]:[[HIGH1]]]
; VI-DAG: v_max_f64 v[0:1], v[0:1], s{{\[}}[[LOW1]]:[[HIGH2]]]		; VI-DAG: v_max_f64 v[0:1], v[0:1], s{{\[}}[[LOW1]]:[[HIGH2]]]
define void @rsq_clamp_f64(double addrspace(1)* %out, double %src) #0 {		define void @rsq_clamp_f64(double addrspace(1)* %out, double %src) #0 {
%rsq_clamp = call double @llvm.amdgcn.rsq.clamp.f64(double %src)		%rsq_clamp = call double @llvm.amdgcn.rsq.clamp.f64(double %src)
store double %rsq_clamp, double addrspace(1)* %out		store double %rsq_clamp, double addrspace(1)* %out
Show All 13 Lines

llvm/trunk/test/CodeGen/AMDGPU/load-local-i16.ll

	Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	define void @local_load_v8i16(<8 x i16> addrspace(3)* %out, <8 x i16> addrspace(3)* %in) {			define void @local_load_v8i16(<8 x i16> addrspace(3)* %out, <8 x i16> addrspace(3)* %in) {
	entry:			entry:
	%ld = load <8 x i16>, <8 x i16> addrspace(3)* %in			%ld = load <8 x i16>, <8 x i16> addrspace(3)* %in
	store <8 x i16> %ld, <8 x i16> addrspace(3)* %out			store <8 x i16> %ld, <8 x i16> addrspace(3)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}local_load_v16i16:			; FUNC-LABEL: {{^}}local_load_v16i16:
	; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:2 offset1:3{{$}}			; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset1:3{{$}}
	; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset1:1{{$}}			; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:1 offset1:2{{$}}


	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET

	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	▲ Show 20 Lines • Show All 196 Lines • ▼ Show 20 Lines
	; EG-DAG: BFE_INT			; EG-DAG: BFE_INT
	define void @local_sextload_v8i16_to_v8i32(<8 x i32> addrspace(3)* %out, <8 x i16> addrspace(3)* %in) #0 {			define void @local_sextload_v8i16_to_v8i32(<8 x i32> addrspace(3)* %out, <8 x i16> addrspace(3)* %in) #0 {
	%load = load <8 x i16>, <8 x i16> addrspace(3)* %in			%load = load <8 x i16>, <8 x i16> addrspace(3)* %in
	%ext = sext <8 x i16> %load to <8 x i32>			%ext = sext <8 x i16> %load to <8 x i32>
	store <8 x i32> %ext, <8 x i32> addrspace(3)* %out			store <8 x i32> %ext, <8 x i32> addrspace(3)* %out
	ret void			ret void
	}			}

	; FIXME: Should have 2 ds_read_b64
	; FUNC-LABEL: {{^}}local_zextload_v16i16_to_v16i32:			; FUNC-LABEL: {{^}}local_zextload_v16i16_to_v16i32:
	; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:1 offset1:2{{$}}			; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset1:1{{$}}
	; GCN-DAG: ds_read_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+$}}			; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:2 offset1:3{{$}}
	; GCN-DAG: ds_read_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset:24

	; GCN: ds_write2_b64			; GCN: ds_write2_b64
	; GCN: ds_write2_b64			; GCN: ds_write2_b64
	; GCN: ds_write2_b64			; GCN: ds_write2_b64
	; GCN: ds_write2_b64			; GCN: ds_write2_b64

	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	define void @local_zextload_v16i16_to_v16i32(<16 x i32> addrspace(3)* %out, <16 x i16> addrspace(3)* %in) #0 {			define void @local_zextload_v16i16_to_v16i32(<16 x i32> addrspace(3)* %out, <16 x i16> addrspace(3)* %in) #0 {
	%load = load <16 x i16>, <16 x i16> addrspace(3)* %in			%load = load <16 x i16>, <16 x i16> addrspace(3)* %in
	%ext = zext <16 x i16> %load to <16 x i32>			%ext = zext <16 x i16> %load to <16 x i32>
	store <16 x i32> %ext, <16 x i32> addrspace(3)* %out			store <16 x i32> %ext, <16 x i32> addrspace(3)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}local_sextload_v16i16_to_v16i32:			; FUNC-LABEL: {{^}}local_sextload_v16i16_to_v16i32:
	; GCN-DAG: ds_read_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+$}}
	; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:1 offset1:3{{$}}			; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset1:1{{$}}
	; GCN-DAG: ds_read_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset:16{{$}}			; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:2 offset1:3{{$}}

	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	define void @local_zextload_v32i16_to_v32i32(<32 x i32> addrspace(3)* %out, <32 x i16> addrspace(3)* %in) #0 {			define void @local_zextload_v32i16_to_v32i32(<32 x i32> addrspace(3)* %out, <32 x i16> addrspace(3)* %in) #0 {
	%load = load <32 x i16>, <32 x i16> addrspace(3)* %in			%load = load <32 x i16>, <32 x i16> addrspace(3)* %in
	%ext = zext <32 x i16> %load to <32 x i32>			%ext = zext <32 x i16> %load to <32 x i32>
	store <32 x i32> %ext, <32 x i32> addrspace(3)* %out			store <32 x i32> %ext, <32 x i32> addrspace(3)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}local_sextload_v32i16_to_v32i32:			; FUNC-LABEL: {{^}}local_sextload_v32i16_to_v32i32:
	; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:1 offset1:2{{$}}			; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:2 offset1:3{{$}}
	; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:3 offset1:4			; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:4 offset1:5
	; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset1:5{{$}}			; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset1:1{{$}}
	; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:6 offset1:7			; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:6 offset1:7
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:14 offset1:15
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:12 offset1:13
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:10 offset1:11
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:8 offset1:9
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:6 offset1:7
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:4 offset1:5
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:2 offset1:3
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset1:1

	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	define void @local_sextload_v32i16_to_v32i32(<32 x i32> addrspace(3)* %out, <32 x i16> addrspace(3)* %in) #0 {			define void @local_sextload_v32i16_to_v32i32(<32 x i32> addrspace(3)* %out, <32 x i16> addrspace(3)* %in) #0 {
	%load = load <32 x i16>, <32 x i16> addrspace(3)* %in			%load = load <32 x i16>, <32 x i16> addrspace(3)* %in
	%ext = sext <32 x i16> %load to <32 x i32>			%ext = sext <32 x i16> %load to <32 x i32>
	store <32 x i32> %ext, <32 x i32> addrspace(3)* %out			store <32 x i32> %ext, <32 x i32> addrspace(3)* %out
	ret void			ret void
	}			}

	; FIXME: Missed read2
	; FUNC-LABEL: {{^}}local_zextload_v64i16_to_v64i32:			; FUNC-LABEL: {{^}}local_zextload_v64i16_to_v64i32:
	; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:11 offset1:15			; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:14 offset1:15
	; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset1:1{{$}}			; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset1:1{{$}}
	; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:2 offset1:3			; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:2 offset1:3
	; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:4 offset1:5			; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:4 offset1:5
	; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:6 offset1:7			; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:6 offset1:7
	; GCN-DAG: ds_read_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset:64			; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:8 offset1:9
	; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:9 offset1:10
	; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:12 offset1:13			; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:12 offset1:13
	; GCN-DAG: ds_read_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset:112			; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:10 offset1:11
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:30 offset1:31
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:28 offset1:29
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:26 offset1:27
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:24 offset1:25
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:22 offset1:23
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:20 offset1:21
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:18 offset1:19
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:16 offset1:17
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:14 offset1:15
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:12 offset1:13
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:10 offset1:11
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:8 offset1:9
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:6 offset1:7
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:4 offset1:5
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:2 offset1:3
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset1:1

	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	▲ Show 20 Lines • Show All 376 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/load-local-i32.ll

	Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	define void @local_load_v8i32(<8 x i32> addrspace(3)* %out, <8 x i32> addrspace(3)* %in) #0 {			define void @local_load_v8i32(<8 x i32> addrspace(3)* %out, <8 x i32> addrspace(3)* %in) #0 {
	entry:			entry:
	%ld = load <8 x i32>, <8 x i32> addrspace(3)* %in			%ld = load <8 x i32>, <8 x i32> addrspace(3)* %in
	store <8 x i32> %ld, <8 x i32> addrspace(3)* %out			store <8 x i32> %ld, <8 x i32> addrspace(3)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}local_load_v16i32:			; FUNC-LABEL: {{^}}local_load_v16i32:
	; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:3 offset1:4{{$}}			; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:6 offset1:7{{$}}
	; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:5 offset1:6{{$}}			; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:4 offset1:5{{$}}
	; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset1:7{{$}}			; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:2 offset1:3{{$}}
	; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset0:1 offset1:2{{$}}			; GCN-DAG: ds_read2_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}} offset1:1{{$}}
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:6 offset1:7
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:4 offset1:5
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset0:2 offset1:3
				; GCN-DAG: ds_write2_b64 v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}} offset1:1
	define void @local_load_v16i32(<16 x i32> addrspace(3)* %out, <16 x i32> addrspace(3)* %in) #0 {			define void @local_load_v16i32(<16 x i32> addrspace(3)* %out, <16 x i32> addrspace(3)* %in) #0 {
	entry:			entry:
	%ld = load <16 x i32>, <16 x i32> addrspace(3)* %in			%ld = load <16 x i32>, <16 x i32> addrspace(3)* %in
	store <16 x i32> %ld, <16 x i32> addrspace(3)* %out			store <16 x i32> %ld, <16 x i32> addrspace(3)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}local_zextload_i32_to_i64:			; FUNC-LABEL: {{^}}local_zextload_i32_to_i64:
	▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/local-memory.amdgcn.ll

	Show All 38 Lines
	; Check that the LDS size emitted correctly			; Check that the LDS size emitted correctly
	; EG: .long 166120			; EG: .long 166120
	; EG-NEXT: .long 8			; EG-NEXT: .long 8
	; GCN: .long 47180			; GCN: .long 47180
	; GCN-NEXT: .long 32900			; GCN-NEXT: .long 32900

	; GCN-LABEL: {{^}}local_memory_two_objects:			; GCN-LABEL: {{^}}local_memory_two_objects:
	; GCN: v_lshlrev_b32_e32 [[ADDRW:v[0-9]+]], 2, v0			; GCN: v_lshlrev_b32_e32 [[ADDRW:v[0-9]+]], 2, v0
	; CI-DAG: ds_write_b32 [[ADDRW]], {{v[0-9]*}} offset:16			; CI-DAG: ds_write2_b32 [[ADDRW]], {{v[0-9]+}}, {{v[0-9]+}} offset1:4
	; CI-DAG: ds_write_b32 [[ADDRW]], {{v[0-9]*$}}

	; SI: v_add_i32_e32 [[ADDRW_OFF:v[0-9]+]], vcc, 16, [[ADDRW]]			; SI: v_add_i32_e32 [[ADDRW_OFF:v[0-9]+]], vcc, 16, [[ADDRW]]

	; SI-DAG: ds_write_b32 [[ADDRW]],			; SI-DAG: ds_write_b32 [[ADDRW]],
	; SI-DAG: ds_write_b32 [[ADDRW_OFF]],			; SI-DAG: ds_write_b32 [[ADDRW_OFF]],

	; GCN: s_barrier			; GCN: s_barrier

	Show All 36 Lines

llvm/trunk/test/CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll

Show First 20 Lines • Show All 150 Lines • ▼ Show 20 Lines	define void @reorder_global_load_local_store_global_load(i32 addrspace(1)* %out, i32 addrspace(3)* %lptr, i32 addrspace(1)* %ptr0) #0 {
%add = add nsw i32 %tmp1, %tmp2		%add = add nsw i32 %tmp1, %tmp2

store i32 %add, i32 addrspace(1)* %out, align 4		store i32 %add, i32 addrspace(1)* %out, align 4
ret void		ret void
}		}

; FUNC-LABEL: @reorder_local_offsets		; FUNC-LABEL: @reorder_local_offsets
; CI: ds_read2_b32 {{v\[[0-9]+:[0-9]+\]}}, {{v[0-9]+}} offset0:100 offset1:102		; CI: ds_read2_b32 {{v\[[0-9]+:[0-9]+\]}}, {{v[0-9]+}} offset0:100 offset1:102
; CI: ds_write_b32 {{v[0-9]+}}, {{v[0-9]+}} offset:400		; CI: ds_write2_b32 {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}} offset0:3 offset1:100
		; CI: ds_read_b32 {{v[0-9]+}}, {{v[0-9]+}} offset:12
; CI: ds_write_b32 {{v[0-9]+}}, {{v[0-9]+}} offset:408		; CI: ds_write_b32 {{v[0-9]+}}, {{v[0-9]+}} offset:408
; CI: buffer_store_dword		; CI: buffer_store_dword
; CI: s_endpgm		; CI: s_endpgm
define void @reorder_local_offsets(i32 addrspace(1)* nocapture %out, i32 addrspace(1)* noalias nocapture readnone %gptr, i32 addrspace(3)* noalias nocapture %ptr0) #0 {		define void @reorder_local_offsets(i32 addrspace(1)* nocapture %out, i32 addrspace(1)* noalias nocapture readnone %gptr, i32 addrspace(3)* noalias nocapture %ptr0) #0 {
%ptr1 = getelementptr inbounds i32, i32 addrspace(3)* %ptr0, i32 3		%ptr1 = getelementptr inbounds i32, i32 addrspace(3)* %ptr0, i32 3
%ptr2 = getelementptr inbounds i32, i32 addrspace(3)* %ptr0, i32 100		%ptr2 = getelementptr inbounds i32, i32 addrspace(3)* %ptr0, i32 100
%ptr3 = getelementptr inbounds i32, i32 addrspace(3)* %ptr0, i32 102		%ptr3 = getelementptr inbounds i32, i32 addrspace(3)* %ptr0, i32 102

▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/store-v3i64.ll

	Show All 40 Lines
	; GCN: buffer_store_byte			; GCN: buffer_store_byte
	; GCN: buffer_store_byte			; GCN: buffer_store_byte
	define void @global_store_v3i64_unaligned(<3 x i64> addrspace(1)* %out, <3 x i64> %x) {			define void @global_store_v3i64_unaligned(<3 x i64> addrspace(1)* %out, <3 x i64> %x) {
	store <3 x i64> %x, <3 x i64> addrspace(1)* %out, align 1			store <3 x i64> %x, <3 x i64> addrspace(1)* %out, align 1
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}local_store_v3i64:			; GCN-LABEL: {{^}}local_store_v3i64:
	; GCN: ds_write_b64			; GCN: ds_write2_b64
	; GCN: ds_write_b64
	; GCN: ds_write_b64			; GCN: ds_write_b64
	define void @local_store_v3i64(<3 x i64> addrspace(3)* %out, <3 x i64> %x) {			define void @local_store_v3i64(<3 x i64> addrspace(3)* %out, <3 x i64> %x) {
	store <3 x i64> %x, <3 x i64> addrspace(3)* %out, align 32			store <3 x i64> %x, <3 x i64> addrspace(3)* %out, align 32
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}local_store_v3i64_unaligned:			; GCN-LABEL: {{^}}local_store_v3i64_unaligned:
	; GCN: ds_write_b8			; GCN: ds_write_b8
	▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/use-sgpr-multiple-times.ll

	Show All 36 Lines
	; GCN: buffer_store_dword [[RESULT]]			; GCN: buffer_store_dword [[RESULT]]
	define void @test_sgpr_use_twice_ternary_op_a_a_b(float addrspace(1)* %out, float %a, float %b) #0 {			define void @test_sgpr_use_twice_ternary_op_a_a_b(float addrspace(1)* %out, float %a, float %b) #0 {
	%fma = call float @llvm.fma.f32(float %a, float %a, float %b) #1			%fma = call float @llvm.fma.f32(float %a, float %a, float %b) #1
	store float %fma, float addrspace(1)* %out, align 4			store float %fma, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}test_use_s_v_s:			; GCN-LABEL: {{^}}test_use_s_v_s:
	; GCN-DAG: s_load_dword [[SA:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, {{0xb\|0x2c}}
	; GCN-DAG: s_load_dword [[SB:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, {{0xc\|0x30}}

	; GCN: buffer_load_dword [[VA0:v[0-9]+]]			; GCN: buffer_load_dword [[VA0:v[0-9]+]]
	; GCN-NOT: v_mov_b32
	; GCN: buffer_load_dword [[VA1:v[0-9]+]]			; GCN: buffer_load_dword [[VA1:v[0-9]+]]
				; GCN-DAG: s_load_dword [[SA:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, {{0xb\|0x2c}}
				; GCN-DAG: s_load_dword [[SB:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, {{0xc\|0x30}}

	; GCN-NOT: v_mov_b32			; GCN-NOT: v_mov_b32
	; GCN: v_mov_b32_e32 [[VB:v[0-9]+]], [[SB]]			; GCN: v_mov_b32_e32 [[VB:v[0-9]+]], [[SB]]
	; GCN-NOT: v_mov_b32			; GCN-NOT: v_mov_b32

	; GCN-DAG: v_fma_f32 [[RESULT0:v[0-9]+]], [[VA0]], [[SA]], [[VB]]			; GCN-DAG: v_fma_f32 [[RESULT0:v[0-9]+]], [[VA0]], [[SA]], [[VB]]
	; GCN-DAG: v_fma_f32 [[RESULT1:v[0-9]+]], [[VA1]], [[SA]], [[VB]]			; GCN-DAG: v_fma_f32 [[RESULT1:v[0-9]+]], [[VA1]], [[SA]], [[VB]]
	; GCN: buffer_store_dword [[RESULT0]]			; GCN: buffer_store_dword [[RESULT0]]
	▲ Show 20 Lines • Show All 212 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the schedulerClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 69600

llvm/trunk/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/trunk/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp

llvm/trunk/test/CodeGen/AMDGPU/ds_read2_offset_order.ll

llvm/trunk/test/CodeGen/AMDGPU/ds_write2.ll

llvm/trunk/test/CodeGen/AMDGPU/fceil64.ll

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.rsq.clamp.ll

llvm/trunk/test/CodeGen/AMDGPU/load-local-i16.ll

llvm/trunk/test/CodeGen/AMDGPU/load-local-i32.ll

llvm/trunk/test/CodeGen/AMDGPU/local-memory.amdgcn.ll

llvm/trunk/test/CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll

llvm/trunk/test/CodeGen/AMDGPU/store-v3i64.ll

llvm/trunk/test/CodeGen/AMDGPU/use-sgpr-multiple-times.ll

AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the scheduler
ClosedPublic