This is an archive of the discontinued LLVM Phabricator instance.

[x86] Teach the x86 backend that it can fold between TCRETURNm* and TCRETURNr* and fix latent bugs with register class updates.
ClosedPublic

Authored by chandlerc on Jul 23 2018, 8:35 PM.

Download Raw Diff

Details

Reviewers

Commits

rGc9313a9ecbad: [x86] Teach the x86 backend that it can fold between TCRETURNm* and TCRETURNr*…
rL337845: [x86] Teach the x86 backend that it can fold between TCRETURNm* and TCRETURNr*…

Summary

Enabling this fully exposes a latent bug in the instruction folding: we
never update the register constraints for the register operands when
fusing a load into another operation. The fused form could, in theory,
have different register constraints on its operands. And in fact,
TCRETURNm* needs its memory operands to use tailcall compatible
registers.

I've updated the folding code to re-constrain all the registers after
they are mapped onto their new instruction.

However, we still can't enable folding in the general case from
TCRETURNr* to TCRETURNm* because doing so may require more registers to
be available during the tail call. If the call itself uses all but one
register, and the folded load would require both a base and index
register, there will not be enough registers to allocate the tail call.

It would be better, IMO, to teach the register allocator to *unfold*
TCRETURNm* when it runs out of registers (or specifically check the
number of registers available during the TCRETURNr*) but I'm not going
to try and solve that for now. Instead, I've just blocked the forward
folding from r -> m, leaving LLVM free to unfold from m -> r as that
doesn't introduce new register pressure constraints.

The down side is that I don't have anything that will directly exercise
this. Instead, I will be immediately using this it my SLH patch. =/

Still worse, without allowing the TCRETURNr* -> TCRETURNm* fold, I don't
have any tests that demonstrate the failure to update the memory operand
register constraints. This patch still seems correct, but I'm nervous
about the degree of testing due to this.

Suggestions?

Diff Detail

Repository: rL LLVM

Event Timeline

chandlerc created this revision.Jul 23 2018, 8:35 PM

Herald added subscribers: hiraditya, mcrosier, sanjoy. · View Herald TranscriptJul 23 2018, 8:35 PM

Harbormaster completed remote builds in B20634: Diff 156965.Jul 23 2018, 8:35 PM

craig.topper added inline comments.Jul 23 2018, 8:41 PM

llvm/lib/Target/X86/X86InstrInfo.cpp
4672 ↗	(On Diff #156965)	Not at my computer but can this be NewMI.getDesc instead of TII.get?

chandlerc marked an inline comment as done.Jul 23 2018, 8:42 PM

chandlerc added inline comments.

llvm/lib/Target/X86/X86InstrInfo.cpp
4672 ↗	(On Diff #156965)	Doh, yes of course. Will update patch when back at my computer as well.

Use NewMI.getDesc() rather than looking it up in TII again. Thanks to Craig for the suggestion.

craig.topper added inline comments.Jul 24 2018, 10:12 AM

llvm/lib/Target/X86/X86InstrInfo.cpp
4662 ↗	(On Diff #157008)	This is cute and all, but why not a normal integer for loop?

chandlerc added inline comments.Jul 24 2018, 10:57 AM

llvm/lib/Target/X86/X86InstrInfo.cpp
4662 ↗	(On Diff #157008)	I find it shorter and more clear (you can look at the review for Sequence.h for other commentary there). If it bothers folks, I can remove it, I just routinely use it in new code.

LGTM

This revision is now accepted and ready to land.Jul 24 2018, 11:43 AM

Closed by commit rL337845: [x86] Teach the x86 backend that it can fold between TCRETURNm* and TCRETURNr*… (authored by chandlerc). · Explain WhyJul 24 2018, 12:04 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86InstrFoldTables.cpp

2 lines

X86InstrInfo.cpp

31 lines

Diff 157100

llvm/trunk/lib/Target/X86/X86InstrFoldTables.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 334 Lines • ▼ Show 20 Lines	static const X86MemoryFoldTableEntry MemoryFoldTable0[] = {
{ X86::SETNPr, X86::SETNPm, TB_FOLDED_STORE },		{ X86::SETNPr, X86::SETNPm, TB_FOLDED_STORE },
{ X86::SETNSr, X86::SETNSm, TB_FOLDED_STORE },		{ X86::SETNSr, X86::SETNSm, TB_FOLDED_STORE },
{ X86::SETOr, X86::SETOm, TB_FOLDED_STORE },		{ X86::SETOr, X86::SETOm, TB_FOLDED_STORE },
{ X86::SETPr, X86::SETPm, TB_FOLDED_STORE },		{ X86::SETPr, X86::SETPm, TB_FOLDED_STORE },
{ X86::SETSr, X86::SETSm, TB_FOLDED_STORE },		{ X86::SETSr, X86::SETSm, TB_FOLDED_STORE },
{ X86::TAILJMPr, X86::TAILJMPm, TB_FOLDED_LOAD },		{ X86::TAILJMPr, X86::TAILJMPm, TB_FOLDED_LOAD },
{ X86::TAILJMPr64, X86::TAILJMPm64, TB_FOLDED_LOAD },		{ X86::TAILJMPr64, X86::TAILJMPm64, TB_FOLDED_LOAD },
{ X86::TAILJMPr64_REX, X86::TAILJMPm64_REX, TB_FOLDED_LOAD },		{ X86::TAILJMPr64_REX, X86::TAILJMPm64_REX, TB_FOLDED_LOAD },
		{ X86::TCRETURNri, X86::TCRETURNmi, TB_FOLDED_LOAD \| TB_NO_FORWARD },
		{ X86::TCRETURNri64, X86::TCRETURNmi64, TB_FOLDED_LOAD \| TB_NO_FORWARD },
{ X86::TEST16ri, X86::TEST16mi, TB_FOLDED_LOAD },		{ X86::TEST16ri, X86::TEST16mi, TB_FOLDED_LOAD },
{ X86::TEST16rr, X86::TEST16mr, TB_FOLDED_LOAD },		{ X86::TEST16rr, X86::TEST16mr, TB_FOLDED_LOAD },
{ X86::TEST32ri, X86::TEST32mi, TB_FOLDED_LOAD },		{ X86::TEST32ri, X86::TEST32mi, TB_FOLDED_LOAD },
{ X86::TEST32rr, X86::TEST32mr, TB_FOLDED_LOAD },		{ X86::TEST32rr, X86::TEST32mr, TB_FOLDED_LOAD },
{ X86::TEST64ri32, X86::TEST64mi32, TB_FOLDED_LOAD },		{ X86::TEST64ri32, X86::TEST64mi32, TB_FOLDED_LOAD },
{ X86::TEST64rr, X86::TEST64mr, TB_FOLDED_LOAD },		{ X86::TEST64rr, X86::TEST64mr, TB_FOLDED_LOAD },
{ X86::TEST8ri, X86::TEST8mi, TB_FOLDED_LOAD },		{ X86::TEST8ri, X86::TEST8mi, TB_FOLDED_LOAD },
{ X86::TEST8rr, X86::TEST8mr, TB_FOLDED_LOAD },		{ X86::TEST8rr, X86::TEST8mr, TB_FOLDED_LOAD },
▲ Show 20 Lines • Show All 5,060 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show All 13 Lines
#include "X86InstrInfo.h"		#include "X86InstrInfo.h"
#include "X86.h"		#include "X86.h"
#include "X86InstrBuilder.h"		#include "X86InstrBuilder.h"
#include "X86InstrFoldTables.h"		#include "X86InstrFoldTables.h"
#include "X86MachineFunctionInfo.h"		#include "X86MachineFunctionInfo.h"
#include "X86Subtarget.h"		#include "X86Subtarget.h"
#include "X86TargetMachine.h"		#include "X86TargetMachine.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
		#include "llvm/ADT/Sequence.h"
#include "llvm/CodeGen/LivePhysRegs.h"		#include "llvm/CodeGen/LivePhysRegs.h"
#include "llvm/CodeGen/LiveVariables.h"		#include "llvm/CodeGen/LiveVariables.h"
#include "llvm/CodeGen/MachineConstantPool.h"		#include "llvm/CodeGen/MachineConstantPool.h"
#include "llvm/CodeGen/MachineDominators.h"		#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineFrameInfo.h"		#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineModuleInfo.h"		#include "llvm/CodeGen/MachineModuleInfo.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
▲ Show 20 Lines • Show All 4,617 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i != NumAddrOps; ++i) {
MIB.addDisp(MO, PtrOffset);		MIB.addDisp(MO, PtrOffset);
} else {		} else {
MIB.add(MO);		MIB.add(MO);
}		}
}		}
}		}
}		}

		static void updateOperandRegConstraints(MachineFunction &MF,
		MachineInstr &NewMI,
		const TargetInstrInfo &TII) {
		MachineRegisterInfo &MRI = MF.getRegInfo();
		const TargetRegisterInfo &TRI = *MRI.getTargetRegisterInfo();

		for (int Idx : llvm::seq<int>(0, NewMI.getNumOperands())) {
		MachineOperand &MO = NewMI.getOperand(Idx);
		// We only need to update constraints on virtual register operands.
		if (!MO.isReg())
		continue;
		unsigned Reg = MO.getReg();
		if (!TRI.isVirtualRegister(Reg))
		continue;

		auto *NewRC = MRI.constrainRegClass(
		Reg, TII.getRegClass(NewMI.getDesc(), Idx, &TRI, MF));
		if (!NewRC) {
		LLVM_DEBUG(
		dbgs() << "WARNING: Unable to update register constraint for operand "
		<< Idx << " of instruction:\n";
		NewMI.dump(); dbgs() << "\n");
		}
		}
		}

static MachineInstr *FuseTwoAddrInst(MachineFunction &MF, unsigned Opcode,		static MachineInstr *FuseTwoAddrInst(MachineFunction &MF, unsigned Opcode,
ArrayRef<MachineOperand> MOs,		ArrayRef<MachineOperand> MOs,
MachineBasicBlock::iterator InsertPt,		MachineBasicBlock::iterator InsertPt,
MachineInstr &MI,		MachineInstr &MI,
const TargetInstrInfo &TII) {		const TargetInstrInfo &TII) {
// Create the base instruction with the memory operand as the first part.		// Create the base instruction with the memory operand as the first part.
// Omit the implicit operands, something BuildMI can't do.		// Omit the implicit operands, something BuildMI can't do.
MachineInstr *NewMI =		MachineInstr *NewMI =
MF.CreateMachineInstr(TII.get(Opcode), MI.getDebugLoc(), true);		MF.CreateMachineInstr(TII.get(Opcode), MI.getDebugLoc(), true);
MachineInstrBuilder MIB(MF, NewMI);		MachineInstrBuilder MIB(MF, NewMI);
addOperands(MIB, MOs);		addOperands(MIB, MOs);

// Loop over the rest of the ri operands, converting them over.		// Loop over the rest of the ri operands, converting them over.
unsigned NumOps = MI.getDesc().getNumOperands() - 2;		unsigned NumOps = MI.getDesc().getNumOperands() - 2;
for (unsigned i = 0; i != NumOps; ++i) {		for (unsigned i = 0; i != NumOps; ++i) {
MachineOperand &MO = MI.getOperand(i + 2);		MachineOperand &MO = MI.getOperand(i + 2);
MIB.add(MO);		MIB.add(MO);
}		}
for (unsigned i = NumOps + 2, e = MI.getNumOperands(); i != e; ++i) {		for (unsigned i = NumOps + 2, e = MI.getNumOperands(); i != e; ++i) {
MachineOperand &MO = MI.getOperand(i);		MachineOperand &MO = MI.getOperand(i);
MIB.add(MO);		MIB.add(MO);
}		}

		updateOperandRegConstraints(MF, *NewMI, TII);

MachineBasicBlock *MBB = InsertPt->getParent();		MachineBasicBlock *MBB = InsertPt->getParent();
MBB->insert(InsertPt, NewMI);		MBB->insert(InsertPt, NewMI);

return MIB;		return MIB;
}		}

static MachineInstr *FuseInst(MachineFunction &MF, unsigned Opcode,		static MachineInstr *FuseInst(MachineFunction &MF, unsigned Opcode,
unsigned OpNo, ArrayRef<MachineOperand> MOs,		unsigned OpNo, ArrayRef<MachineOperand> MOs,
Show All 10 Lines	for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
if (i == OpNo) {		if (i == OpNo) {
assert(MO.isReg() && "Expected to fold into reg operand!");		assert(MO.isReg() && "Expected to fold into reg operand!");
addOperands(MIB, MOs, PtrOffset);		addOperands(MIB, MOs, PtrOffset);
} else {		} else {
MIB.add(MO);		MIB.add(MO);
}		}
}		}

		updateOperandRegConstraints(MF, *NewMI, TII);

MachineBasicBlock *MBB = InsertPt->getParent();		MachineBasicBlock *MBB = InsertPt->getParent();
MBB->insert(InsertPt, NewMI);		MBB->insert(InsertPt, NewMI);

return MIB;		return MIB;
}		}

static MachineInstr *MakeM0Inst(const TargetInstrInfo &TII, unsigned Opcode,		static MachineInstr *MakeM0Inst(const TargetInstrInfo &TII, unsigned Opcode,
ArrayRef<MachineOperand> MOs,		ArrayRef<MachineOperand> MOs,
▲ Show 20 Lines • Show All 2,995 Lines • Show Last 20 Lines