This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/X86/
-
Target/
-
X86/
-
X86FixupSetCC.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
no-fixup-setcc.ll

Differential D22229

[X86] Only apply setcc fixup if GR32_ABCDs are free.
Needs ReviewPublic

Authored by bryant on Jul 11 2016, 10:50 AM.

Download Raw Diff

Details

Reviewers

compnerd
mkuper

Summary

X86FixupSetCC transforms setcc zero-ext sequences of the form (intel syntax),

eflags_def_instr
setcc gr8
movzbl gr32, gr8

into:

xor gr32, gr32
eflags_def_instr
setcc gr8

On x86 targets, it's possible for eflags_def_instr to use or def all available GR32 registers (e.g., cmpxchg8b implicitly uses all of E{ABCD}X). Under such circumstances, this transformation should not occur.

Diff Detail

Repository: rL LLVM

Event Timeline

bryant updated this revision to Diff 63535.Jul 11 2016, 10:50 AM

bryant retitled this revision from to [X86] Only apply setcc fixup if GR32_ABCDs are free..

bryant updated this object.

bryant added a reviewer: mkuper.

bryant set the repository for this revision to rL LLVM.

bryant updated this object.

Thanks, bryant!

I'm not entirely sure this is the right fix for PR28489, for two reasons:

I'm not convinced this is a bug on the pass site and not the regalloc site. Sure, this is "fast" regalloc, but is it really supposed to just run out of registers here instead of spilling the zero?
Even if so, I may have been two restrict with requiring GR32_ABCD for both ZeroReg and InsertReg. InsertReg certainly must be GR32_ABCD, but ZeroReg may be a regular GR32. Sure, we'd like to map ZeroReg and InsertReg to the same register (using a sane RA, not fast). But a sane RA would put them in the same register unless it can't, anyway. And relaxing this constraint should help FastRA handle this issue.

Regardless, I'm thinking about turning this whole pass off for CodeGenOpt::None. It's an optimization pass, and has no benefit with fast RA anyway, since it won't properly merge the registers...

I have a couple of questions.

Under -O3, which uses the sane RA, the test case I've provided generates:

cmpxchg8b
movl gr32, 0
setcc gr8

Is this what you meant by "spilling the zero"? Because the binary encoding for
this sequence,

0:   0f c7 0e                cmpxchg8b (%esi)
3:   a1 00 00 00 00          mov    0x0,%eax
8:   0f 94 c0                sete   %al

is two bytes longer than the result otherwise gotten without the pass:

 b:   0f c7 0e                cmpxchg8b (%esi)
 e:   0f 94 c0                sete   %al
11:   0f b6 c0                movzbl %al,%eax

which defeats one of the original intended benefits of this pass. So relying on a spill produces a less-optimal result.

What would the final machine output under i686 for a INSERT_SUBREG with a non-GR32_ABCD operand, since only the ABCDs have an 8-bit sub-register?

In D22229#480675, @bryant wrote:
I have a couple of questions.

Under -O3, which uses the sane RA, the test case I've provided generates:
cmpxchg8b
movl gr32, 0
setcc gr8
Is this what you meant by "spilling the zero"? Because the binary encoding for
this sequence,
0:   0f c7 0e                cmpxchg8b (%esi)
3:   a1 00 00 00 00          mov    0x0,%eax
8:   0f 94 c0                sete   %al
is two bytes longer than the result otherwise gotten without the pass:
 b:   0f c7 0e                cmpxchg8b (%esi)
 e:   0f 94 c0                sete   %al
11:   0f b6 c0                movzbl %al,%eax
which defeats one of the original intended benefits of this pass. So relying on a spill produces a less-optimal result.

Yes, I agree this is a pessimization. But it's probably fairly rare (cmpxchg on i686, and the pcmpstr issue we've already run into, which is slightly different and won't be solved by this patch). And in any case, if we want to prevent it, I'm not sure this is the right patch. More generally, we probably just want to know which physregs are live at the insertion point.

What would the final machine output under i686 for a INSERT_SUBREG with a non-GR32_ABCD operand, since only the ABCDs have an 8-bit sub-register?

It will be broken. :-)
What I meant is to INSERT_SUBREG into a GR32_ABCD, but zero a GR32. I think this leaves even fast RA with enough freedom to use a non-ABCD register for the MOV32r0 and then copy it into an appropriate register for the insert.

I'm not familiar with the pcmpstr issue. Would you mind providing a reference or example?

More generally, we probably just want to know which physregs are live at the insertion point.

What if we moved the pass from pre-reg alloc to, say, post-reg alloc or even pre-rewrite and scavenged for available GR32_ABCDs before the EFLAGS-defining insertion point?

bryant added a reviewer: compnerd.Jul 11 2016, 12:54 PM

As I said on IRC - the post-RA option was suggested on D21774, and I'm not sure it's superior - it may miss other cases.
I don't object to doing this post-RA on principle, I'm just not sure it's better.

In any case, there are three completely separate issues here:

Fast RA should not break because it runs out of registers. We can keep tracking this as PR28489.
FixupSetCC should not run at -O0. I went ahead and disabled it (r275099), so we should no longer see (1) in practice, but it still should be fixed at some point.
We should not pessimize cmpxchg in optimized code. Could you please file a separate bug for this? Maybe moving the pass to post-RA is the right answer - although I'm still not convinced.

Revision Contents

Path

Size

lib/

Target/

X86/

X86FixupSetCC.cpp

26 lines

test/

CodeGen/

X86/

no-fixup-setcc.ll

13 lines

Diff 63535

lib/Target/X86/X86FixupSetCC.cpp

Context not available.

	// Return true if MI imp-uses eflags.	// Return true if MI imp-uses eflags.
	bool impUsesFlags(MachineInstr *MI);	bool impUsesFlags(MachineInstr *MI);
		bool usesAllGR32_ABCD(const MachineInstr *) const;

	// Return true if this is the opcode of a SetCC instruction with a register	// Return true if this is the opcode of a SetCC instruction with a register
	// output.	// output.
Context not available.

	MachineRegisterInfo *MRI;	MachineRegisterInfo *MRI;
	const X86InstrInfo *TII;	const X86InstrInfo *TII;
		const X86RegisterInfo *TRI;

	enum { SearchBound = 16 };	enum { SearchBound = 16 };

Context not available.
	return false;	return false;
	}	}

		bool X86FixupSetCCPass::usesAllGR32_ABCD(const MachineInstr *mi) const {
		BitVector gr32abcd(TRI->getNumRegs());
		gr32abcd.set(X86::EAX);
		gr32abcd.set(X86::EBX);
		gr32abcd.set(X86::ECX);
		gr32abcd.set(X86::EDX);

		for (const MachineOperand &op : mi->operands()) {
		if (op.isReg() && !TRI->isVirtualRegister(op.getReg()) && op.getReg() &&
		(op.isDef() \|\| op.isUse() \|\| op.isImplicit())) {
		for (MCRegAliasIterator reg(op.getReg(), TRI, true); reg.isValid();
		++reg) {
		gr32abcd.reset(*reg);
		}
		}
		}

		return gr32abcd.none();
		}

	bool X86FixupSetCCPass::runOnMachineFunction(MachineFunction &MF) {	bool X86FixupSetCCPass::runOnMachineFunction(MachineFunction &MF) {
	bool Changed = false;	bool Changed = false;
	MRI = &MF.getRegInfo();	MRI = &MF.getRegInfo();
	TII = MF.getSubtarget<X86Subtarget>().getInstrInfo();	TII = MF.getSubtarget<X86Subtarget>().getInstrInfo();
		TRI = MF.getSubtarget<X86Subtarget>().getRegisterInfo();

	SmallVector<MachineInstr*, 4> ToErase;	SmallVector<MachineInstr*, 4> ToErase;

Context not available.
	// Find the preceding instruction that imp-defs eflags.	// Find the preceding instruction that imp-defs eflags.
	MachineInstr *FlagsDefMI = findFlagsImpDef(	MachineInstr *FlagsDefMI = findFlagsImpDef(
	MI.getParent(), MachineBasicBlock::reverse_iterator(&MI));	MI.getParent(), MachineBasicBlock::reverse_iterator(&MI));
	if (!FlagsDefMI)	if (!FlagsDefMI \|\| (!MF.getSubtarget<X86Subtarget>().is64Bit() &&
		usesAllGR32_ABCD(FlagsDefMI)))
	continue;	continue;

	// We'd like to put something that clobbers eflags directly before	// We'd like to put something that clobbers eflags directly before
Context not available.

test/CodeGen/X86/no-fixup-setcc.ll

This file was added.

				; RUN: llc < %s -O0 -march=x86 -o - \| FileCheck %s

				define i1 @f(i64*, i64, i64) {
				; CHECK-LABEL: f
				; CHECK: cmpxchg8b
				; CHECK-NEXT: sete
				; CHECK-NEXT: movzbl
				%a = cmpxchg i64* %0, i64 %1, i64 %2 seq_cst seq_cst
				%b = extractvalue { i64, i1 } %a, 1
				%c = zext i1 %b to i32
				call void asm sideeffect "", "r"(i32 %c)
				ret i1 %b
				}