Download Raw Diff

Details

Reviewers

arsenm
cfang

Commits

rG5b27072f5719: [AMDGPU] Do not insert an instruction into worklist twice in movetovalu
rL308039: [AMDGPU] Do not insert an instruction into worklist twice in movetovalu

Summary

In movetovalu, when we process an instruction in the worklist, we may delete/modify the instruction. So putting an
instruction in the worklist may result in handling a deleted/modified instruction, and thus cause trouble.

Diff Detail

Repository: rL LLVM

Event Timeline

cfang created this revision.Jun 27 2017, 4:39 PM

Herald added subscribers: t-tye, tpr, dstuttard and 4 others. · View Herald TranscriptJun 27 2017, 4:39 PM

Why is it possible for an instruction here to end up in the worklist multiple times? I'm surprised we haven't hit this before and I'm not sure exactly when this happens. Can you reduce the test any further? I would expect only 2-3 actual instructions in it.

lib/Target/AMDGPU/SIInstrInfo.cpp
3860 ↗	(On Diff #104310)	Missing space
3947 ↗	(On Diff #104310)	Missing space
3947 ↗	(On Diff #104310)	I think the worklist can be pretty big. A brute force search through is probably bad.

This is the shortest test I can get! The instruction in question is the "and". It is first put
in the list because of tmp3, and the second time by tmp13.

define amdgpu_kernel void @in_worklist_once() #0 {
bb:

%tmp = load i64, i64* undef
br label %bb1

bb1: ; preds = %bb1, %bb

%tmp2 = phi i64 [ undef, %bb ], [ %tmp16, %bb1 ]
%tmp3 = phi i64 [ %tmp, %bb ], [ undef, %bb1 ]
%tmp11 = lshr i64 %tmp2, 14
%tmp13 = xor i64 %tmp11, %tmp2
%tmp15 = and i64 %tmp3, %tmp13
%tmp16 = xor i64 %tmp15, %tmp3
br label %bb1

}

lib/Target/AMDGPU/SIInstrInfo.cpp
3947 ↗	(On Diff #104310)	What's your suggestion for the search in smallvector?

Update based on Matt's comments.

TODO:

improve the search in the worklist!

Hi, is it possible to upload a full diff?

lib/Target/AMDGPU/SIInstrInfo.cpp
3947 ↗	(On Diff #104310)	I would suggest to use hash for O(1) search.

I'm in the process of adding a SmallSet Workset in collaboration with the SmallVectorImpl for the Worklist to avoid searching through the Worklist every time before insertion. Will send out new diff when it's working.

Changpeng Fang is on vacation and I'm trying to complete the code review for him.

Changpeng's newest test case did show the "and" instr to be accessed twice through the 2 phi nodes. I tried an early version of CL 1419065 and it crashed also, so the problem did exist before. When a node was inserted into the Worklist twice and processed last in first out, the first pop may modify the instr and replaced it with new ones, the second pop may now have a junk node. Hence it is possible the problem did not reveal itself by luck earlier on, I guess. It is imperative to avoid pushing an instr into the Worklist twice.

To avoid searching if an instr had already been entered into the Worklist, I am changing the Worklist type from SmallPtrVector to SmallPtrSet. SmallPtrSet's insert will ensure an item not to be entered twice. There is a catch for this, SmallPtrVector access is last in first out, SmallPtrSet is first in first out, this may lead to usage of register order to be different than before, so I added a linear search to get the last item entered. I don't believe this costs much performance degradation, but guarantee generated code to be same as before.

t-tye added inline comments.Jul 9 2017, 12:37 PM

lib/Target/AMDGPU/SIInstrInfo.cpp
3415–3418 ↗	(On Diff #105693)	@alfred.j.huang you mention "SmallPtrVector access is last in first out". Is that true when the SmallPtrVector exceeds the size of the small number? If not then if the worklist gets larger than 32 it will no longer be deterministic in the order it returns the work list items which IIRC is against the LLVM policy which wants compilation to be consistent from run to run.

Thanks for mentioning this. I have reservation for the order myself. I was thinking reverse might work due to https://reviews.llvm.org/D26718. There was also this "LLVM_ENABLE_REVERSE_ITERATION" define and -mllvm reverse-iterate that when used in a single compilation, would make the SmallPtrSet iteration in reverse order. But I think you are right, SmallPtrSet is actually an unordered containiner, anything larger than the small 32 entries are hashed, so there is really no order, talking of iteration order is a bit strange here.

So back to the original question, when the original SmallPtrVector was used, we were traversing the "USE" chain for a previous dest register, we inserted them into the Worklist in order of the USErs and processed them by popping, so they are in the reverse order. I'm not really sure if it matters at all which order they were pushed and popped, I realized the difference only when one llvm lit test shows a difference of " v_add_i32_e32 v0, vcc, v1, v0" versus " v_add_i32_e32 v0, vcc, v0, v1", which in theory are the same. If testing with ocl test, conformance and llvm lit show there is no difference in runtime, can we actually ignore the ordering?

As a last resort, I can reuse my original changes which it is the most secure, but probably not with the best performance in mind by using 2 containers; a SmallPtrVector for push and pop, plus a SmallPtrSet to avoid inserting duplicates. In this case the result will guarantee to be the same as before.

A concern would be that if using the set caused running the same test multiple times resulted in different results because the different values of pointers caused the iteration order to change. Just because the tests run on your machine does not mean they will run on another machine and get the same result if it is relying on the hashed values of pointers.

Ok. Opinion well taken, I will change back to use 2 containers. SmallPtrVector as before for insertion and to collaborate with a SmallPtrSet for checking of duplicates. A new patch will be submitted later. Thanks!

SetVector and SmallSetVector already exist

I do not know the existence of SetVector. It basically contains sub containers set_ and vector_, which is exactly what I need. Instead of me imitating it with an explicit SmallSet and SmallVector. This should work. Thanks!

Switch to SmallSetVector in place of SmallPtrVector so an item is inserted into the container once only. The order of pushing and popping are the same as before.

Test is missing from patch

Sorry. I thought I'm using the same test Changpeng created and posted before. Anyway, I posted it here again and changed the GCN checkings to make the test more meaningful.

LGTM

lib/Target/AMDGPU/SIInstrInfo.h
42 ↗	(On Diff #106531)	A better name would indicate it's for the moveToVALU worklist. I've thought about moving all of this code into the actual pass instead of SIInstrInfo, so it doesn't really matter.
test/CodeGen/AMDGPU/move-to-valu-worklist.ll
2 ↗	(On Diff #106531)	Can you add a comment describing the problem

This revision is now accepted and ready to land.Jul 13 2017, 2:25 PM

Closed by commit rL308039: [AMDGPU] Do not insert an instruction into worklist twice in movetovalu (authored by ajhuang). · Explain WhyJul 14 2017, 10:57 AM

This revision was automatically updated to reflect the committed changes.

Diff 106666

llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.h

	Show All 13 Lines


	#ifndef LLVM_LIB_TARGET_AMDGPU_SIINSTRINFO_H			#ifndef LLVM_LIB_TARGET_AMDGPU_SIINSTRINFO_H
	#define LLVM_LIB_TARGET_AMDGPU_SIINSTRINFO_H			#define LLVM_LIB_TARGET_AMDGPU_SIINSTRINFO_H

	#include "AMDGPUInstrInfo.h"			#include "AMDGPUInstrInfo.h"
	#include "SIDefines.h"			#include "SIDefines.h"
	#include "SIRegisterInfo.h"			#include "SIRegisterInfo.h"
				#include "llvm/ADT/SetVector.h"

	namespace llvm {			namespace llvm {

	class SIInstrInfo final : public AMDGPUInstrInfo {			class SIInstrInfo final : public AMDGPUInstrInfo {
	private:			private:
	const SIRegisterInfo RI;			const SIRegisterInfo RI;
	const SISubtarget &ST;			const SISubtarget &ST;

	// The the inverse predicate should have the negative value.			// The the inverse predicate should have the negative value.
	enum BranchPredicate {			enum BranchPredicate {
	INVALID_BR = 0,			INVALID_BR = 0,
	SCC_TRUE = 1,			SCC_TRUE = 1,
	SCC_FALSE = -1,			SCC_FALSE = -1,
	VCCNZ = 2,			VCCNZ = 2,
	VCCZ = -2,			VCCZ = -2,
	EXECNZ = -3,			EXECNZ = -3,
	EXECZ = 3			EXECZ = 3
	};			};

				typedef SmallSetVector<MachineInstr *, 32> SetVectorType;

	static unsigned getBranchOpcode(BranchPredicate Cond);			static unsigned getBranchOpcode(BranchPredicate Cond);
	static BranchPredicate getBranchPredicate(unsigned Opcode);			static BranchPredicate getBranchPredicate(unsigned Opcode);

	unsigned buildExtractSubReg(MachineBasicBlock::iterator MI,			unsigned buildExtractSubReg(MachineBasicBlock::iterator MI,
	MachineRegisterInfo &MRI,			MachineRegisterInfo &MRI,
	MachineOperand &SuperReg,			MachineOperand &SuperReg,
	const TargetRegisterClass *SuperRC,			const TargetRegisterClass *SuperRC,
	unsigned SubIdx,			unsigned SubIdx,
	const TargetRegisterClass *SubRC) const;			const TargetRegisterClass *SubRC) const;
	MachineOperand buildExtractSubRegOrImm(MachineBasicBlock::iterator MI,			MachineOperand buildExtractSubRegOrImm(MachineBasicBlock::iterator MI,
	MachineRegisterInfo &MRI,			MachineRegisterInfo &MRI,
	MachineOperand &SuperReg,			MachineOperand &SuperReg,
	const TargetRegisterClass *SuperRC,			const TargetRegisterClass *SuperRC,
	unsigned SubIdx,			unsigned SubIdx,
	const TargetRegisterClass *SubRC) const;			const TargetRegisterClass *SubRC) const;

	void swapOperands(MachineInstr &Inst) const;			void swapOperands(MachineInstr &Inst) const;

	void lowerScalarAbs(SmallVectorImpl<MachineInstr *> &Worklist,			void lowerScalarAbs(SetVectorType &Worklist,
	MachineInstr &Inst) const;			MachineInstr &Inst) const;

	void splitScalar64BitUnaryOp(SmallVectorImpl<MachineInstr *> &Worklist,			void splitScalar64BitUnaryOp(SetVectorType &Worklist,
	MachineInstr &Inst, unsigned Opcode) const;			MachineInstr &Inst, unsigned Opcode) const;

	void splitScalar64BitBinaryOp(SmallVectorImpl<MachineInstr *> &Worklist,			void splitScalar64BitBinaryOp(SetVectorType &Worklist,
	MachineInstr &Inst, unsigned Opcode) const;			MachineInstr &Inst, unsigned Opcode) const;

	void splitScalar64BitBCNT(SmallVectorImpl<MachineInstr *> &Worklist,			void splitScalar64BitBCNT(SetVectorType &Worklist,
	MachineInstr &Inst) const;			MachineInstr &Inst) const;
	void splitScalar64BitBFE(SmallVectorImpl<MachineInstr *> &Worklist,			void splitScalar64BitBFE(SetVectorType &Worklist,
	MachineInstr &Inst) const;			MachineInstr &Inst) const;
	void movePackToVALU(SmallVectorImpl<MachineInstr *> &Worklist,			void movePackToVALU(SetVectorType &Worklist,
	MachineRegisterInfo &MRI,			MachineRegisterInfo &MRI,
	MachineInstr &Inst) const;			MachineInstr &Inst) const;

	void addUsersToMoveToVALUWorklist(			void addUsersToMoveToVALUWorklist(
	unsigned Reg, MachineRegisterInfo &MRI,			unsigned Reg, MachineRegisterInfo &MRI,
	SmallVectorImpl<MachineInstr *> &Worklist) const;			SetVectorType &Worklist) const;

	void			void
	addSCCDefUsersToVALUWorklist(MachineInstr &SCCDefInst,			addSCCDefUsersToVALUWorklist(MachineInstr &SCCDefInst,
	SmallVectorImpl<MachineInstr *> &Worklist) const;			SetVectorType &Worklist) const;

	const TargetRegisterClass *			const TargetRegisterClass *
	getDestEquivalentVGPRClass(const MachineInstr &Inst) const;			getDestEquivalentVGPRClass(const MachineInstr &Inst) const;

	bool checkInstOffsetsDoNotOverlap(MachineInstr &MIa, MachineInstr &MIb) const;			bool checkInstOffsetsDoNotOverlap(MachineInstr &MIa, MachineInstr &MIb) const;

	unsigned findUsedSGPR(const MachineInstr &MI, int OpIndices[3]) const;			unsigned findUsedSGPR(const MachineInstr &MI, int OpIndices[3]) const;

	▲ Show 20 Lines • Show All 788 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.cpp

Show First 20 Lines • Show All 3,402 Lines • ▼ Show 20 Lines	if (SRsrcIdx != -1) {
// Update the instruction to use NewVaddr		// Update the instruction to use NewVaddr
VAddr->setReg(NewVAddr);		VAddr->setReg(NewVAddr);
// Update the instruction to use NewSRsrc		// Update the instruction to use NewSRsrc
SRsrc->setReg(NewSRsrc);		SRsrc->setReg(NewSRsrc);
}		}
}		}

void SIInstrInfo::moveToVALU(MachineInstr &TopInst) const {		void SIInstrInfo::moveToVALU(MachineInstr &TopInst) const {
SmallVector<MachineInstr *, 128> Worklist;		SetVectorType Worklist;
Worklist.push_back(&TopInst);		Worklist.insert(&TopInst);

while (!Worklist.empty()) {		while (!Worklist.empty()) {
MachineInstr &Inst = *Worklist.pop_back_val();		MachineInstr &Inst = *Worklist.pop_back_val();
MachineBasicBlock *MBB = Inst.getParent();		MachineBasicBlock *MBB = Inst.getParent();
MachineRegisterInfo &MRI = MBB->getParent()->getRegInfo();		MachineRegisterInfo &MRI = MBB->getParent()->getRegInfo();

unsigned Opcode = Inst.getOpcode();		unsigned Opcode = Inst.getOpcode();
unsigned NewOpcode = getVALUOp(Inst);		unsigned NewOpcode = getVALUOp(Inst);
▲ Show 20 Lines • Show All 184 Lines • ▼ Show 20 Lines	while (!Worklist.empty()) {
// Legalize the operands		// Legalize the operands
legalizeOperands(Inst);		legalizeOperands(Inst);

if (HasDst)		if (HasDst)
addUsersToMoveToVALUWorklist(NewDstReg, MRI, Worklist);		addUsersToMoveToVALUWorklist(NewDstReg, MRI, Worklist);
}		}
}		}

void SIInstrInfo::lowerScalarAbs(SmallVectorImpl<MachineInstr *> &Worklist,		void SIInstrInfo::lowerScalarAbs(SetVectorType &Worklist,
MachineInstr &Inst) const {		MachineInstr &Inst) const {
MachineBasicBlock &MBB = *Inst.getParent();		MachineBasicBlock &MBB = *Inst.getParent();
MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();		MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();
MachineBasicBlock::iterator MII = Inst;		MachineBasicBlock::iterator MII = Inst;
DebugLoc DL = Inst.getDebugLoc();		DebugLoc DL = Inst.getDebugLoc();

MachineOperand &Dest = Inst.getOperand(0);		MachineOperand &Dest = Inst.getOperand(0);
MachineOperand &Src = Inst.getOperand(1);		MachineOperand &Src = Inst.getOperand(1);
unsigned TmpReg = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);		unsigned TmpReg = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
unsigned ResultReg = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);		unsigned ResultReg = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);

BuildMI(MBB, MII, DL, get(AMDGPU::V_SUB_I32_e32), TmpReg)		BuildMI(MBB, MII, DL, get(AMDGPU::V_SUB_I32_e32), TmpReg)
.addImm(0)		.addImm(0)
.addReg(Src.getReg());		.addReg(Src.getReg());

BuildMI(MBB, MII, DL, get(AMDGPU::V_MAX_I32_e64), ResultReg)		BuildMI(MBB, MII, DL, get(AMDGPU::V_MAX_I32_e64), ResultReg)
.addReg(Src.getReg())		.addReg(Src.getReg())
.addReg(TmpReg);		.addReg(TmpReg);

MRI.replaceRegWith(Dest.getReg(), ResultReg);		MRI.replaceRegWith(Dest.getReg(), ResultReg);
addUsersToMoveToVALUWorklist(ResultReg, MRI, Worklist);		addUsersToMoveToVALUWorklist(ResultReg, MRI, Worklist);
}		}

void SIInstrInfo::splitScalar64BitUnaryOp(		void SIInstrInfo::splitScalar64BitUnaryOp(
SmallVectorImpl<MachineInstr *> &Worklist, MachineInstr &Inst,		SetVectorType &Worklist, MachineInstr &Inst,
unsigned Opcode) const {		unsigned Opcode) const {
MachineBasicBlock &MBB = *Inst.getParent();		MachineBasicBlock &MBB = *Inst.getParent();
MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();		MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();

MachineOperand &Dest = Inst.getOperand(0);		MachineOperand &Dest = Inst.getOperand(0);
MachineOperand &Src0 = Inst.getOperand(1);		MachineOperand &Src0 = Inst.getOperand(1);
DebugLoc DL = Inst.getDebugLoc();		DebugLoc DL = Inst.getDebugLoc();

Show All 34 Lines	void SIInstrInfo::splitScalar64BitUnaryOp(
// We don't need to legalizeOperands here because for a single operand, src0		// We don't need to legalizeOperands here because for a single operand, src0
// will support any kind of input.		// will support any kind of input.

// Move all users of this moved value.		// Move all users of this moved value.
addUsersToMoveToVALUWorklist(FullDestReg, MRI, Worklist);		addUsersToMoveToVALUWorklist(FullDestReg, MRI, Worklist);
}		}

void SIInstrInfo::splitScalar64BitBinaryOp(		void SIInstrInfo::splitScalar64BitBinaryOp(
SmallVectorImpl<MachineInstr *> &Worklist, MachineInstr &Inst,		SetVectorType &Worklist, MachineInstr &Inst,
unsigned Opcode) const {		unsigned Opcode) const {
MachineBasicBlock &MBB = *Inst.getParent();		MachineBasicBlock &MBB = *Inst.getParent();
MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();		MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();

MachineOperand &Dest = Inst.getOperand(0);		MachineOperand &Dest = Inst.getOperand(0);
MachineOperand &Src0 = Inst.getOperand(1);		MachineOperand &Src0 = Inst.getOperand(1);
MachineOperand &Src1 = Inst.getOperand(2);		MachineOperand &Src1 = Inst.getOperand(2);
DebugLoc DL = Inst.getDebugLoc();		DebugLoc DL = Inst.getDebugLoc();
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	void SIInstrInfo::splitScalar64BitBinaryOp(
legalizeOperands(LoHalf);		legalizeOperands(LoHalf);
legalizeOperands(HiHalf);		legalizeOperands(HiHalf);

// Move all users of this moved vlaue.		// Move all users of this moved vlaue.
addUsersToMoveToVALUWorklist(FullDestReg, MRI, Worklist);		addUsersToMoveToVALUWorklist(FullDestReg, MRI, Worklist);
}		}

void SIInstrInfo::splitScalar64BitBCNT(		void SIInstrInfo::splitScalar64BitBCNT(
SmallVectorImpl<MachineInstr *> &Worklist, MachineInstr &Inst) const {		SetVectorType &Worklist, MachineInstr &Inst) const {
MachineBasicBlock &MBB = *Inst.getParent();		MachineBasicBlock &MBB = *Inst.getParent();
MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();		MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();

MachineBasicBlock::iterator MII = Inst;		MachineBasicBlock::iterator MII = Inst;
DebugLoc DL = Inst.getDebugLoc();		DebugLoc DL = Inst.getDebugLoc();

MachineOperand &Dest = Inst.getOperand(0);		MachineOperand &Dest = Inst.getOperand(0);
MachineOperand &Src = Inst.getOperand(1);		MachineOperand &Src = Inst.getOperand(1);
Show All 19 Lines	void SIInstrInfo::splitScalar64BitBCNT(

MRI.replaceRegWith(Dest.getReg(), ResultReg);		MRI.replaceRegWith(Dest.getReg(), ResultReg);

// We don't need to legalize operands here. src0 for etiher instruction can be		// We don't need to legalize operands here. src0 for etiher instruction can be
// an SGPR, and the second input is unused or determined here.		// an SGPR, and the second input is unused or determined here.
addUsersToMoveToVALUWorklist(ResultReg, MRI, Worklist);		addUsersToMoveToVALUWorklist(ResultReg, MRI, Worklist);
}		}

void SIInstrInfo::splitScalar64BitBFE(SmallVectorImpl<MachineInstr *> &Worklist,		void SIInstrInfo::splitScalar64BitBFE(SetVectorType &Worklist,
MachineInstr &Inst) const {		MachineInstr &Inst) const {
MachineBasicBlock &MBB = *Inst.getParent();		MachineBasicBlock &MBB = *Inst.getParent();
MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();		MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();
MachineBasicBlock::iterator MII = Inst;		MachineBasicBlock::iterator MII = Inst;
DebugLoc DL = Inst.getDebugLoc();		DebugLoc DL = Inst.getDebugLoc();

MachineOperand &Dest = Inst.getOperand(0);		MachineOperand &Dest = Inst.getOperand(0);
uint32_t Imm = Inst.getOperand(2).getImm();		uint32_t Imm = Inst.getOperand(2).getImm();
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	void SIInstrInfo::splitScalar64BitBFE(SetVectorType &Worklist,

MRI.replaceRegWith(Dest.getReg(), ResultReg);		MRI.replaceRegWith(Dest.getReg(), ResultReg);
addUsersToMoveToVALUWorklist(ResultReg, MRI, Worklist);		addUsersToMoveToVALUWorklist(ResultReg, MRI, Worklist);
}		}

void SIInstrInfo::addUsersToMoveToVALUWorklist(		void SIInstrInfo::addUsersToMoveToVALUWorklist(
unsigned DstReg,		unsigned DstReg,
MachineRegisterInfo &MRI,		MachineRegisterInfo &MRI,
SmallVectorImpl<MachineInstr *> &Worklist) const {		SetVectorType &Worklist) const {
for (MachineRegisterInfo::use_iterator I = MRI.use_begin(DstReg),		for (MachineRegisterInfo::use_iterator I = MRI.use_begin(DstReg),
E = MRI.use_end(); I != E;) {		E = MRI.use_end(); I != E;) {
MachineInstr &UseMI = *I->getParent();		MachineInstr &UseMI = *I->getParent();
if (!canReadVGPR(UseMI, I.getOperandNo())) {		if (!canReadVGPR(UseMI, I.getOperandNo())) {
Worklist.push_back(&UseMI);		Worklist.insert(&UseMI);

do {		do {
++I;		++I;
} while (I != E && I->getParent() == &UseMI);		} while (I != E && I->getParent() == &UseMI);
} else {		} else {
++I;		++I;
}		}
}		}
}		}

void SIInstrInfo::movePackToVALU(SmallVectorImpl<MachineInstr *> &Worklist,		void SIInstrInfo::movePackToVALU(SetVectorType &Worklist,
MachineRegisterInfo &MRI,		MachineRegisterInfo &MRI,
MachineInstr &Inst) const {		MachineInstr &Inst) const {
unsigned ResultReg = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);		unsigned ResultReg = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
MachineBasicBlock *MBB = Inst.getParent();		MachineBasicBlock *MBB = Inst.getParent();
MachineOperand &Src0 = Inst.getOperand(1);		MachineOperand &Src0 = Inst.getOperand(1);
MachineOperand &Src1 = Inst.getOperand(2);		MachineOperand &Src1 = Inst.getOperand(2);
const DebugLoc &DL = Inst.getDebugLoc();		const DebugLoc &DL = Inst.getDebugLoc();

▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	void SIInstrInfo::movePackToVALU(SetVectorType &Worklist,
}		}

MachineOperand &Dest = Inst.getOperand(0);		MachineOperand &Dest = Inst.getOperand(0);
MRI.replaceRegWith(Dest.getReg(), ResultReg);		MRI.replaceRegWith(Dest.getReg(), ResultReg);
addUsersToMoveToVALUWorklist(ResultReg, MRI, Worklist);		addUsersToMoveToVALUWorklist(ResultReg, MRI, Worklist);
}		}

void SIInstrInfo::addSCCDefUsersToVALUWorklist(		void SIInstrInfo::addSCCDefUsersToVALUWorklist(
MachineInstr &SCCDefInst, SmallVectorImpl<MachineInstr *> &Worklist) const {		MachineInstr &SCCDefInst, SetVectorType &Worklist) const {
// This assumes that all the users of SCC are in the same block		// This assumes that all the users of SCC are in the same block
// as the SCC def.		// as the SCC def.
for (MachineInstr &MI :		for (MachineInstr &MI :
llvm::make_range(MachineBasicBlock::iterator(SCCDefInst),		llvm::make_range(MachineBasicBlock::iterator(SCCDefInst),
SCCDefInst.getParent()->end())) {		SCCDefInst.getParent()->end())) {
// Exit if we find another SCC def.		// Exit if we find another SCC def.
if (MI.findRegisterDefOperandIdx(AMDGPU::SCC) != -1)		if (MI.findRegisterDefOperandIdx(AMDGPU::SCC) != -1)
return;		return;

if (MI.findRegisterUseOperandIdx(AMDGPU::SCC) != -1)		if (MI.findRegisterUseOperandIdx(AMDGPU::SCC) != -1)
Worklist.push_back(&MI);		Worklist.insert(&MI);
}		}
}		}

const TargetRegisterClass *SIInstrInfo::getDestEquivalentVGPRClass(		const TargetRegisterClass *SIInstrInfo::getDestEquivalentVGPRClass(
const MachineInstr &Inst) const {		const MachineInstr &Inst) const {
const TargetRegisterClass *NewDstRC = getOpRegClass(Inst, 0);		const TargetRegisterClass *NewDstRC = getOpRegClass(Inst, 0);

switch (Inst.getOpcode()) {		switch (Inst.getOpcode()) {
▲ Show 20 Lines • Show All 405 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/move-to-valu-worklist.ll

				; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s \| FileCheck --check-prefix=GCN %s

				; In moveToVALU(), move to vector ALU is performed, all instrs in
				; the use chain will be visited. We do not want the same node to be
				; pushed to the visit worklist more than once.

				; GCN-LABEL: {{^}}in_worklist_once:
				; GCN: buffer_load_dword
				; GCN: BB0_1:
				; GCN: v_xor_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
				; GCN-NEXT: v_xor_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
				; GCN: v_and_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
				; GCN-NEXT: v_and_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
				define amdgpu_kernel void @in_worklist_once() #0 {
				bb:
				%tmp = load i64, i64* undef
				br label %bb1

				bb1: ; preds = %bb1, %bb
				%tmp2 = phi i64 [ undef, %bb ], [ %tmp16, %bb1 ]
				%tmp3 = phi i64 [ %tmp, %bb ], [ undef, %bb1 ]
				%tmp11 = shl i64 %tmp2, 14
				%tmp13 = xor i64 %tmp11, %tmp2
				%tmp15 = and i64 %tmp3, %tmp13
				%tmp16 = xor i64 %tmp15, %tmp3
				br label %bb1
				}

				attributes #0 = { nounwind }

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Do not insert an instruction into worklist twice in movetovalu
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 106666

llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.h

llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/trunk/test/CodeGen/AMDGPU/move-to-valu-worklist.ll

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Do not insert an instruction into worklist twice in movetovalu ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 106666

llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.h

llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/trunk/test/CodeGen/AMDGPU/move-to-valu-worklist.ll

AMDGPU/SI: Do not insert an instruction into worklist twice in movetovalu
ClosedPublic