This is an archive of the discontinued LLVM Phabricator instance.

Prototype fix for replacing the SETcc/MOVZX idiom with XOR/SETcc for a 32-bit conditional set of 0,1
AbandonedPublic

Authored by DavidKreitzer on Jun 17 2016, 1:38 PM.

Download Raw Diff

Details

Reviewers: None

Summary

I am just posting this for your reference. This was my initial attempt to fix the SETcc problem back in February. As you can see, the changes weren't all that extensive. The basic approach was to define new SETCC pseudo-opcodes that return 32-bit/64-bit values along with patterns that cause a MOV32r0/SETCC sequence to be generated. The pseudo-ops then get lowered after RA to the "real" SETCC opcodes.

The approach seems sound to me, and it fixed the performance kernel that I wrote. But there are a few things that I don't like.

(1) The opcode explosion is kind of gross: we should really have a condition field rather than building the condition into the opcode.
(2) I couldn't figure out a good way to handle the different register constraints in 32-bit vs. 64-bit mode, so I created separate opcodes for them.

During testing, I got lots of failures in our test suites and never got around to debugging. It could be something obvious/trivial.

So, feel free to take this code and run with it or just give me feedback on the basic approach.

Diff Detail

Event Timeline

DavidKreitzer updated this revision to Diff 61124.Jun 17 2016, 1:38 PM

DavidKreitzer retitled this revision from to Prototype fix for replacing the SETcc/MOVZX idiom with XOR/SETcc for a 32-bit conditional set of 0,1.

DavidKreitzer updated this object.

DavidKreitzer added a reviewer: mkuper.

Hi Dave,

I need to look at this more carefully (I'm not entirely sure it works as intended), but regardless, I'm not a huge fan of having two different opcodes, one for SETCC with a source, and one for SETCC without a source. I mean, imagine we were designing this from scratch. I don't think we'd define two opcodes.

What I've wanted to do - and started doing last year - was add a source dependency to the regular SETCC opcode, and feeding it an IMPLICIT_DEF value when it doesn't matter. The problem is that would require touching every piece of code that creates a SETCC, and probably a lot of places that match on it, hence the extensive changes I was referring to.

Thanks for the quick response, Michael!

I don't think it's unreasonable to have two opcodes - the existing opcode is an 8-bit definition. The new opcode is a 32-bit definition. It is not much different than having two opcodes for addb and addl. The fact that the 32-bit form takes a source operands and the 8-bit form doesn't is just an artifact of the architecture.

FWIW, this is how we implemented it in the Intel Compiler (i.e. two opcodes). Of course, in the Intel Compiler, we don't have the same opcode explosion problem, because the condition is separate from the opcode. So it's really just 2 opcodes, not 2xN opcodes.

I would not be surprised at all if the changes do not work as intended, especially the pattern specifications. I understand that code better than I did 6 months ago, but I'm still pretty new to it.

I don't think the add analogy really holds. addb and addl really are two different opcodes in the ISA, with different semantics. And that also means that they are different all the way down to the MC layer.

What really worries me functionally speaking isn't the patterns - I'm not an expert on those either :-). It's whether after ExpandPostRA you're guaranteed the xor and the (now regular) setcc will actually stay where they need to be w.r.t each other, without the setcc depending on the output of the xor.
Are you sure this is true? And I don't mean in the sense that it works now, but as a general invariant? I'm not saying this isn't true, we may in fact already be relying on this, I just don't know.

This is really main reason why I would really prefer having a single opcode - that guarantees that we'll have an explicit dependency all the way through. (Or we concievably could have two different opcodes all the way through, but, yikes).

Your question is a good one, and I don't know the answer. Who knows? That might be the cause of all the failures I was getting.

So what you are basically proposing is to have a single opcode that produces a 32-bit result, right? I can buy into that proposal. But yes, that will require much more extensive changes, I expect.

After thinking about this a bit more, what I wrote before is probably nonsense, and you're likely right.

I mean, on the DAG level, you really have to have the explicit dependence. But once you're on the MI level, any instruction that clobbers the low GR8 has to have an implicit dependency on the GR32, otherwise nothing would work right. (E.g. MOV8ri is also modeled as only having a GR8 out operand, but no sources). I'll need to check how that really works.

Anyway, I'm still not a fan of having separate opcodes, but the objections I have right now are much weaker than "it won't work". :-)

Ok, I'm an idiot.

The dependence we have on the MI level is between the xor and the eventual user of the GR32 (with the setcc clobbering the low GR8).
The fact the setcc doesn't depend on the xor doesn't matter at the MI stage.

Just clearing this out of my review queue. :-)

DavidKreitzer abandoned this revision.Jan 19 2017, 5:38 AM

Revision Contents

Path

Size

lib/

Target/

X86/

X86InstrCMovSetCC.td

7 lines

X86InstrCompiler.td

30 lines

X86InstrInfo.cpp

44 lines

utils/

TableGen/

CodeGenDAGPatterns.cpp

4 lines

Diff 61124

lib/Target/X86/X86InstrCMovSetCC.td

Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	def r : I<opc, MRMXr, (outs GR8:$dst), (ins),
!strconcat(Mnemonic, "\t$dst"),		!strconcat(Mnemonic, "\t$dst"),
[(set GR8:$dst, (X86setcc OpNode, EFLAGS))],		[(set GR8:$dst, (X86setcc OpNode, EFLAGS))],
IIC_SET_R>, TB, Sched<[WriteALU]>;		IIC_SET_R>, TB, Sched<[WriteALU]>;
def m : I<opc, MRMXm, (outs), (ins i8mem:$dst),		def m : I<opc, MRMXm, (outs), (ins i8mem:$dst),
!strconcat(Mnemonic, "\t$dst"),		!strconcat(Mnemonic, "\t$dst"),
[(store (X86setcc OpNode, EFLAGS), addr:$dst)],		[(store (X86setcc OpNode, EFLAGS), addr:$dst)],
IIC_SET_M>, TB, Sched<[WriteALU, WriteStore]>;		IIC_SET_M>, TB, Sched<[WriteALU, WriteStore]>;
} // Uses = [EFLAGS]		} // Uses = [EFLAGS]

		let Uses = [EFLAGS], isPseudo = 1, Constraints = "$src = $dst" in {
		def r32_32 : I<0, Pseudo, (outs GR32_ABCD:$dst), (ins GR32_ABCD:$src), "",
		[], IIC_ALU_NONMEM>, Sched<[WriteZero]>;
		def r32_64 : I<0, Pseudo, (outs GR32:$dst), (ins GR32:$src), "",
		[], IIC_ALU_NONMEM>, Sched<[WriteZero]>;
		} // Uses = [EFLAGS], Constraints = "$src = $dst"
}		}

defm SETO : SETCC<0x90, "seto", X86_COND_O>; // is overflow bit set		defm SETO : SETCC<0x90, "seto", X86_COND_O>; // is overflow bit set
defm SETNO : SETCC<0x91, "setno", X86_COND_NO>; // is overflow bit not set		defm SETNO : SETCC<0x91, "setno", X86_COND_NO>; // is overflow bit not set
defm SETB : SETCC<0x92, "setb", X86_COND_B>; // unsigned less than		defm SETB : SETCC<0x92, "setb", X86_COND_B>; // unsigned less than
defm SETAE : SETCC<0x93, "setae", X86_COND_AE>; // unsigned greater or equal		defm SETAE : SETCC<0x93, "setae", X86_COND_AE>; // unsigned greater or equal
defm SETE : SETCC<0x94, "sete", X86_COND_E>; // equal to		defm SETE : SETCC<0x94, "sete", X86_COND_E>; // equal to
defm SETNE : SETCC<0x95, "setne", X86_COND_NE>; // not equal to		defm SETNE : SETCC<0x95, "setne", X86_COND_NE>; // not equal to
Show All 11 Lines

lib/Target/X86/X86InstrCompiler.td

	Show First 20 Lines • Show All 1,132 Lines • ▼ Show 20 Lines
	defm : CMOVmr<X86_COND_G , CMOVLE16rm, CMOVLE32rm, CMOVLE64rm>;			defm : CMOVmr<X86_COND_G , CMOVLE16rm, CMOVLE32rm, CMOVLE64rm>;
	defm : CMOVmr<X86_COND_P , CMOVNP16rm, CMOVNP32rm, CMOVNP64rm>;			defm : CMOVmr<X86_COND_P , CMOVNP16rm, CMOVNP32rm, CMOVNP64rm>;
	defm : CMOVmr<X86_COND_NP, CMOVP16rm , CMOVP32rm , CMOVP64rm>;			defm : CMOVmr<X86_COND_NP, CMOVP16rm , CMOVP32rm , CMOVP64rm>;
	defm : CMOVmr<X86_COND_S , CMOVNS16rm, CMOVNS32rm, CMOVNS64rm>;			defm : CMOVmr<X86_COND_S , CMOVNS16rm, CMOVNS32rm, CMOVNS64rm>;
	defm : CMOVmr<X86_COND_NS, CMOVS16rm , CMOVS32rm , CMOVS64rm>;			defm : CMOVmr<X86_COND_NS, CMOVS16rm , CMOVS32rm , CMOVS64rm>;
	defm : CMOVmr<X86_COND_O , CMOVNO16rm, CMOVNO32rm, CMOVNO64rm>;			defm : CMOVmr<X86_COND_O , CMOVNO16rm, CMOVNO32rm, CMOVNO64rm>;
	defm : CMOVmr<X86_COND_NO, CMOVO16rm , CMOVO32rm , CMOVO64rm>;			defm : CMOVmr<X86_COND_NO, CMOVO16rm , CMOVO32rm , CMOVO64rm>;

				multiclass SETCC32<Instruction SetCCr32_32, Instruction SetCCr32_64,
				PatLeaf Cond> {
				def : Pat<(i32 (zext (i8 (X86setcc Cond, EFLAGS)))),
				(SetCCr32_32 (MOV32r0))>,
				Requires<[Not64BitMode]>;
				def : Pat<(i32 (zext (i8 (X86setcc Cond, EFLAGS)))),
				(SetCCr32_64 (MOV32r0))>,
				Requires<[In64BitMode]>;
				def : Pat<(i64 (zext (i8 (X86setcc Cond, EFLAGS)))),
				(SUBREG_TO_REG (i64 0), (SetCCr32_64 (MOV32r0)), sub_32bit)>,
				Requires<[In64BitMode]>;
				}

				defm : SETCC32<SETOr32_32, SETOr32_64, X86_COND_O>;
				defm : SETCC32<SETNOr32_32, SETNOr32_64, X86_COND_NO>;
				defm : SETCC32<SETBr32_32, SETBr32_64, X86_COND_B>;
				defm : SETCC32<SETAEr32_32, SETAEr32_64, X86_COND_AE>;
				defm : SETCC32<SETEr32_32, SETEr32_64, X86_COND_E>;
				defm : SETCC32<SETNEr32_32, SETNEr32_64, X86_COND_NE>;
				defm : SETCC32<SETBEr32_32, SETBEr32_64, X86_COND_BE>;
				defm : SETCC32<SETAr32_32, SETAr32_64, X86_COND_A>;
				defm : SETCC32<SETSr32_32, SETSr32_64, X86_COND_S>;
				defm : SETCC32<SETNSr32_32, SETNSr32_64, X86_COND_NS>;
				defm : SETCC32<SETPr32_32, SETPr32_64, X86_COND_P>;
				defm : SETCC32<SETNPr32_32, SETNPr32_64, X86_COND_NP>;
				defm : SETCC32<SETLr32_32, SETLr32_64, X86_COND_L>;
				defm : SETCC32<SETGEr32_32, SETGEr32_64, X86_COND_GE>;
				defm : SETCC32<SETLEr32_32, SETLEr32_64, X86_COND_LE>;
				defm : SETCC32<SETGr32_32, SETGr32_64, X86_COND_G>;

	// zextload bool -> zextload byte			// zextload bool -> zextload byte
	def : Pat<(zextloadi8i1 addr:$src), (AND8ri (MOV8rm addr:$src), (i8 1))>;			def : Pat<(zextloadi8i1 addr:$src), (AND8ri (MOV8rm addr:$src), (i8 1))>;
	def : Pat<(zextloadi16i1 addr:$src), (AND16ri8 (MOVZX16rm8 addr:$src), (i16 1))>;			def : Pat<(zextloadi16i1 addr:$src), (AND16ri8 (MOVZX16rm8 addr:$src), (i16 1))>;
	def : Pat<(zextloadi32i1 addr:$src), (AND32ri8 (MOVZX32rm8 addr:$src), (i32 1))>;			def : Pat<(zextloadi32i1 addr:$src), (AND32ri8 (MOVZX32rm8 addr:$src), (i32 1))>;
	def : Pat<(zextloadi64i1 addr:$src),			def : Pat<(zextloadi64i1 addr:$src),
	(SUBREG_TO_REG (i64 0),			(SUBREG_TO_REG (i64 0),
	(AND32ri8 (MOVZX32rm8 addr:$src), (i32 1)), sub_32bit)>;			(AND32ri8 (MOVZX32rm8 addr:$src), (i32 1)), sub_32bit)>;

	▲ Show 20 Lines • Show All 716 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,349 Lines • ▼ Show 20 Lines	static void expandLoadStackGuard(MachineInstrBuilder &MIB,
BuildMI(MBB, I, DL, TII.get(X86::MOV64rm), Reg).addReg(X86::RIP).addImm(1)		BuildMI(MBB, I, DL, TII.get(X86::MOV64rm), Reg).addReg(X86::RIP).addImm(1)
.addReg(0).addGlobalAddress(GV, 0, X86II::MO_GOTPCREL).addReg(0)		.addReg(0).addGlobalAddress(GV, 0, X86II::MO_GOTPCREL).addReg(0)
.addMemOperand(MMO);		.addMemOperand(MMO);
MIB->setDebugLoc(DL);		MIB->setDebugLoc(DL);
MIB->setDesc(TII.get(X86::MOV64rm));		MIB->setDesc(TII.get(X86::MOV64rm));
MIB.addReg(Reg, RegState::Kill).addImm(1).addReg(0).addImm(0).addReg(0);		MIB.addReg(Reg, RegState::Kill).addImm(1).addReg(0).addImm(0).addReg(0);
}		}

		static bool ExpandSetCCPseudo(MachineInstrBuilder &MIB,
		const MCInstrDesc &Desc) {
		MachineOperand &DestOp = MIB->getOperand(0);
		unsigned Reg = getX86SubSuperRegister(DestOp.getReg(), 8);

		MIB->setDesc(Desc);
		DestOp.setReg(Reg);

		return true;
		}

bool X86InstrInfo::expandPostRAPseudo(MachineBasicBlock::iterator MI) const {		bool X86InstrInfo::expandPostRAPseudo(MachineBasicBlock::iterator MI) const {
bool HasAVX = Subtarget.hasAVX();		bool HasAVX = Subtarget.hasAVX();
MachineInstrBuilder MIB(*MI->getParent()->getParent(), MI);		MachineInstrBuilder MIB(*MI->getParent()->getParent(), MI);
switch (MI->getOpcode()) {		switch (MI->getOpcode()) {
case X86::MOV32r0:		case X86::MOV32r0:
return Expand2AddrUndef(MIB, get(X86::XOR32rr));		return Expand2AddrUndef(MIB, get(X86::XOR32rr));
case X86::MOV32r1:		case X86::MOV32r1:
return expandMOV32r1(MIB, this, /MinusOne=*/ false);		return expandMOV32r1(MIB, this, /MinusOne=*/ false);
Show All 22 Lines	case X86::AVX2_SETALLONES:
return Expand2AddrUndef(MIB, get(X86::VPCMPEQDYrr));		return Expand2AddrUndef(MIB, get(X86::VPCMPEQDYrr));
case X86::TEST8ri_NOREX:		case X86::TEST8ri_NOREX:
MI->setDesc(get(X86::TEST8ri));		MI->setDesc(get(X86::TEST8ri));
return true;		return true;
case X86::MOV32ri64:		case X86::MOV32ri64:
MI->setDesc(get(X86::MOV32ri));		MI->setDesc(get(X86::MOV32ri));
return true;		return true;

		case X86::SETOr32_64:
		case X86::SETOr32_32: return ExpandSetCCPseudo(MIB, get(X86::SETOr));
		case X86::SETNOr32_64:
		case X86::SETNOr32_32: return ExpandSetCCPseudo(MIB, get(X86::SETNOr));
		case X86::SETBr32_64:
		case X86::SETBr32_32: return ExpandSetCCPseudo(MIB, get(X86::SETBr));
		case X86::SETAEr32_64:
		case X86::SETAEr32_32: return ExpandSetCCPseudo(MIB, get(X86::SETAEr));
		case X86::SETEr32_64:
		case X86::SETEr32_32: return ExpandSetCCPseudo(MIB, get(X86::SETEr));
		case X86::SETNEr32_64:
		case X86::SETNEr32_32: return ExpandSetCCPseudo(MIB, get(X86::SETNEr));
		case X86::SETBEr32_64:
		case X86::SETBEr32_32: return ExpandSetCCPseudo(MIB, get(X86::SETBEr));
		case X86::SETAr32_64:
		case X86::SETAr32_32: return ExpandSetCCPseudo(MIB, get(X86::SETAr));
		case X86::SETSr32_64:
		case X86::SETSr32_32: return ExpandSetCCPseudo(MIB, get(X86::SETSr));
		case X86::SETNSr32_64:
		case X86::SETNSr32_32: return ExpandSetCCPseudo(MIB, get(X86::SETNSr));
		case X86::SETPr32_64:
		case X86::SETPr32_32: return ExpandSetCCPseudo(MIB, get(X86::SETPr));
		case X86::SETNPr32_64:
		case X86::SETNPr32_32: return ExpandSetCCPseudo(MIB, get(X86::SETNPr));
		case X86::SETLr32_64:
		case X86::SETLr32_32: return ExpandSetCCPseudo(MIB, get(X86::SETLr));
		case X86::SETGEr32_64:
		case X86::SETGEr32_32: return ExpandSetCCPseudo(MIB, get(X86::SETGEr));
		case X86::SETLEr32_64:
		case X86::SETLEr32_32: return ExpandSetCCPseudo(MIB, get(X86::SETLEr));
		case X86::SETGr32_64:
		case X86::SETGr32_32: return ExpandSetCCPseudo(MIB, get(X86::SETGr));

// KNL does not recognize dependency-breaking idioms for mask registers,		// KNL does not recognize dependency-breaking idioms for mask registers,
// so kxnor %k1, %k1, %k2 has a RAW dependence on %k1.		// so kxnor %k1, %k1, %k2 has a RAW dependence on %k1.
// Using %k0 as the undef input register is a performance heuristic based		// Using %k0 as the undef input register is a performance heuristic based
// on the assumption that %k0 is used less frequently than the other mask		// on the assumption that %k0 is used less frequently than the other mask
// registers, since it is not usable as a write mask.		// registers, since it is not usable as a write mask.
// FIXME: A more advanced approach would be to choose the best input mask		// FIXME: A more advanced approach would be to choose the best input mask
// register based on context.		// register based on context.
case X86::KSET0B:		case X86::KSET0B:
▲ Show 20 Lines • Show All 1,968 Lines • Show Last 20 Lines

utils/TableGen/CodeGenDAGPatterns.cpp

Show First 20 Lines • Show All 2,132 Lines • ▼ Show 20 Lines	TreePatternNode TreePattern::ParseTreePattern(Init TheInit, StringRef OpName){
}		}

DagInit *Dag = dyn_cast<DagInit>(TheInit);		DagInit *Dag = dyn_cast<DagInit>(TheInit);
if (!Dag) {		if (!Dag) {
TheInit->dump();		TheInit->dump();
error("Pattern has unexpected init kind!");		error("Pattern has unexpected init kind!");
}		}
DefInit *OpDef = dyn_cast<DefInit>(Dag->getOperator());		DefInit *OpDef = dyn_cast<DefInit>(Dag->getOperator());
if (!OpDef) error("Pattern has unexpected operator type!");		if (!OpDef) {
		error("Pattern has unexpected operator type!");
		}
Record *Operator = OpDef->getDef();		Record *Operator = OpDef->getDef();

if (Operator->isSubClassOf("ValueType")) {		if (Operator->isSubClassOf("ValueType")) {
// If the operator is a ValueType, then this must be "type cast" of a leaf		// If the operator is a ValueType, then this must be "type cast" of a leaf
// node.		// node.
if (Dag->getNumArgs() != 1)		if (Dag->getNumArgs() != 1)
error("Type cast only takes one operand!");		error("Type cast only takes one operand!");

▲ Show 20 Lines • Show All 1,691 Lines • Show Last 20 Lines