Download Raw Diff

Details

Reviewers

ab
craig.topper
MatzeB
n.bozhenov
a.elovikov

Commits

rGe12e08c68045: Handle the case of live 16-bit subregisters in X86FixupBWInsts
rL321674: Handle the case of live 16-bit subregisters in X86FixupBWInsts

Summary

This fixes Bugzilla report 35240 (https://bugs.llvm.org/show_bug.cgi?id=35240).

What was happening in that case is we had an instruction the was copying R10B into R9B and we wanted to promote this to a 32-bit move. This potentially leaves the upper 24 bits of R9D as undefined. That's OK if those bits are never used, but if they are then we need to prevent the transformation.

The Machine IR before the transformation in the failing test case looks like this:

BB#3: derived from LLVM BB %entry
    Live Ins: %EAX %EDX %ESI %R8B **%R10B %R9D**
    Predecessors according to CFG: BB#2 BB#1
  %CL<def> = MOV8rr %R8B<kill>
  %EDX<def,tied1> = SHR32rCL %EDX<kill,tied0>, %EFLAGS<imp-def,dead>, %CL<imp-use>
  MOV32mr %RSP, 1, %noreg, 24, %noreg, %EDX<kill>; mem:Volatile ST4[%k]
**  %R9B<def> = MOV8rr %R10B<kill>, %R9D<imp-use,kill>, %R9D<imp-def>**
  %CX<def> = MOV16rm %RSP, 1, %noreg, 18, %noreg; mem:Volatile LD2[%f](dereferenceable)
**  %CX<def,tied1> = OR16rr %CX<kill,tied0>, %R9W, %EFLAGS<imp-def,dead>**
 <snip>

As LivePhysRegs is stepping backward, it adds R9W to the live set, but when X86FixupBWInsts checks for liveness, it only checks R8D. This patch fixes that.

When a live register is added using LivePhysRegs::addReg() all of its sub-registers are also marked as live, but not super registers. In theory, we could take advantage of this behavior and only check the 16-bit sub-register for liveness (since if the 32-bit subregister is live the 16-bit subregister will be also), but the check is cheap and it seems preferable to have the code explicitly checking the conditions it requires.

Diff Detail

Repository: rL LLVM

Event Timeline

andrew.w.kaylor created this revision.Nov 27 2017, 2:45 PM

Do you have a reduced test case to include?

FWIW At a first glance this feels to me as we should have the fixes in isLive() instead as that seems to roughly be "LivePhysRegs::contains() with tweaks" where the LivePhysRegs::contains() would check super registers.

MatzeB added a reviewer: a.elovikov.Nov 27 2017, 3:00 PM

In D40524#936926, @MatzeB wrote:

FWIW At a first glance this feels to me as we should have the fixes in isLive() instead as that seems to roughly be "LivePhysRegs::contains() with tweaks" where the LivePhysRegs::contains() would check super registers.

Yeah, that makes sense. The reason I didn't do that was that this code already had special case handling for the sub_8bit_hi subregister. I suppose a case could be made that that also should be in isLive(), but since this is the only caller of isLive() the special logic of isLive() could just as well be sunk here.

Either way, you're right that it's more distributed than it ought to be. I'll clean it up.

-Refactored getSuperRegDestIfDead() to consolidate the liveness checking logic.
-Added a test case.

a.elovikov added inline comments.Nov 28 2017, 8:16 AM

test/CodeGen/X86/fixup-bw-inst.mir
195 ↗	(On Diff #124496)	I'm surprised not to see "%R9D<imp-use,kill>, %R9D<imp-def>" from the Bugzilla here. IIIC the use below does not prevent anything because it might be use-undef so I'm not sure what exactly in this test prevents the transformation.

andrew.w.kaylor added inline comments.Nov 28 2017, 11:45 AM

test/CodeGen/X86/fixup-bw-inst.mir
195 ↗	(On Diff #124496)	As the test is written, it hits the case where the super-register is detected as live after the instruction (because LivePhysRegs sees the use of r9w and so adds it to the live set) but we don't see either an implicit use or an implicit def of the register in the instruction, so we assume it must have been live before the instruction. I think that this will be true in actual Machine IR that has been produced normally. In this case, the MIR in the test is just incomplete. I suppose the use on line 197 isn't considered as "possibly undef" just because the MIR in the test doesn't say it is undef. I'll update this to make the test look like the actual failure that was seen in the wild. BTW, I did run this test without my patch and it fails (because r9d isn't live and we weren't checking r9w). The test also fails (even with my change) if I add 'implicit-def %r9d' but not 'implicit kill %r9d'.

Updated the test case.

craig.topper added inline comments.Nov 29 2017, 10:39 AM

lib/Target/X86/X86FixupBWInsts.cpp
195 ↗	(On Diff #124496)	Drop the curlies?
204 ↗	(On Diff #124496)	Drop the curlies?

andrew.w.kaylor added inline comments.Nov 29 2017, 11:20 AM

lib/Target/X86/X86FixupBWInsts.cpp
204 ↗	(On Diff #124496)	I put the curlies here (and above) because they're inside a scope that needs curlies. Someone suggested to me once that I should use or not use curlies consistently throughout a given scope, but it may be that I'm applying that too broadly. clang-format seems to be happy either way. I'm also happy either way.

Death to the curlies!

LGTM

This revision is now accepted and ready to land.Nov 29 2017, 4:46 PM

I'm not sure if the following is possible/legal but it fails even with this patch:

body:             |
  bb.0:
    successors:
    liveins: %ch, %bl

    %cl = MOV8rr %bl, implicit-def %cx, implicit killed %ch, implicit-def %eflags
    ; CHECK-NOT: MOV32rr
    RETQ %cx

If that's a valid testcase than "a little more checking to do." (isLive before this patch) has to be fixed. If it's invalid - we'd need an assert for that too (or some mir-verify?). However, that should not be a part of this change, probably. Hence, just asking what's your opinion about that testcase.

a.elovikov added inline comments.Nov 30 2017, 4:12 AM

test/CodeGen/X86/fixup-bw-inst.mir
197 ↗	(On Diff #124800)	And it would be better to have an additional testcase based on this one but with %r9 changed to some register that has 8bit_hi subregister.

In D40524#940330, @a.elovikov wrote:
I'm not sure if the following is possible/legal but it fails even with this patch:
body:             |
  bb.0:
    successors:
    liveins: %ch, %bl

    %cl = MOV8rr %bl, implicit-def %cx, implicit killed %ch, implicit-def %eflags
    ; CHECK-NOT: MOV32rr
    RETQ %cx
If that's a valid testcase than "a little more checking to do." (isLive before this patch) has to be fixed. If it's invalid - we'd need an assert for that too (or some mir-verify?). However, that should not be a part of this change, probably. Hence, just asking what's your opinion about that testcase.

Hi Andrei,

I apologize for the long delay in responding. I've been distracted with other things and haven't committed this change yet.

I'm not terribly familiar with MIR as it appears in a test like this, but I can't see any reason that your proposed testcase would be invalid. I do think it's extremely unlikely to arise in actual generated Machine IR, since we hardly ever use the 8-bit high registers. What I'd like to do is commit the change from this review as is, and file a new Bugzilla report to track the 8bit_hi issue. Does that sound reasonable to you?

In D40524#951832, @andrew.w.kaylor wrote:
I'm not terribly familiar with MIR as it appears in a test like this, but I can't see any reason that your proposed testcase would be invalid. I do think it's extremely unlikely to arise in actual generated Machine IR, since we hardly ever use the 8-bit high registers. What I'd like to do is commit the change from this review as is, and file a new Bugzilla report to track the 8bit_hi issue. Does that sound reasonable to you?

Yes, these are two separate issues, IMO.

@andrew.w.kaylor What is happening with this patch, will you be able to commit it for Jan 3 to fix PR35240? What about the high 8-bit register issue - is there a bugzilla (and repro) for this?

In D40524#965331, @RKSimon wrote:

@andrew.w.kaylor What is happening with this patch, will you be able to commit it for Jan 3 to fix PR35240? What about the high 8-bit register issue - is there a bugzilla (and repro) for this?

Yes. I apologize. I've had some other priorities that pushed this to the back burner, but I should be able to commit it today assuming a clean merge with the latest code base. I have not investigated the high 8-it issue, but I believe Andrei's example will demonstrate a problem. I think it is very unlikely to occur during actual compilation.

Closed by commit rL321674: Handle the case of live 16-bit subregisters in X86FixupBWInsts (authored by akaylor). · Explain WhyJan 2 2018, 1:05 PM

This revision was automatically updated to reflect the committed changes.

In D40524#965913, @andrew.w.kaylor wrote:

In D40524#965331, @RKSimon wrote:

@andrew.w.kaylor What is happening with this patch, will you be able to commit it for Jan 3 to fix PR35240? What about the high 8-bit register issue - is there a bugzilla (and repro) for this?

Yes. I apologize. I've had some other priorities that pushed this to the back burner, but I should be able to commit it today assuming a clean merge with the latest code base. I have not investigated the high 8-it issue, but I believe Andrei's example will demonstrate a problem. I think it is very unlikely to occur during actual compilation.

Cheers - did you create the bug report in the end? I can't seem to find it.

a.elovikov mentioned this in D42533: [X86FixupBWInsts] Fix miscompilation if sibling sub-register is live..Jan 25 2018, 5:57 AM

a.elovikov mentioned this in rL323635: [X86FixupBWInsts] Fix miscompilation if sibling sub-register is live..Jan 29 2018, 1:29 AM

In D40524#969410, @RKSimon wrote:

Cheers - did you create the bug report in the end? I can't seem to find it.

So I never did get around to filing the bug report, but @a.elovikov has already fixed this. Thanks, Andrei!

Diff 128452

llvm/trunk/lib/Target/X86/X86FixupBWInsts.cpp

Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	bool FixupBWInstPass::runOnMachineFunction(MachineFunction &MF) {
for (auto &MBB : MF)		for (auto &MBB : MF)
processBasicBlock(MF, MBB);		processBasicBlock(MF, MBB);

DEBUG(dbgs() << "End X86FixupBWInsts\n";);		DEBUG(dbgs() << "End X86FixupBWInsts\n";);

return true;		return true;
}		}

/// Check if register \p Reg is live after the \p MI.		/// \brief Check if after \p OrigMI the only portion of super register
///		/// of the destination register of \p OrigMI that is alive is that
/// \p LiveRegs should be in a state describing liveness information in		/// destination register.
/// that exact place as this function tries to precise analysis made
/// by \p LiveRegs by exploiting the information about particular
/// instruction \p MI. \p MI is expected to be one of the MOVs handled
/// by the x86FixupBWInsts pass.
/// Note: similar to LivePhysRegs::contains this would state that
/// super-register is not used if only some part of it is used.
///
/// X86 backend does not have subregister liveness tracking enabled,
/// so liveness information might be overly conservative. However, for
/// some specific instructions (this pass only cares about MOVs) we can
/// produce more precise results by analysing that MOV's operands.
///
/// Indeed, if super-register is not live before the mov it means that it
/// was originally <read-undef> and so we are free to modify these
/// undef upper bits. That may happen in case where the use is in another MBB
/// and the vreg/physreg corresponding to the move has higher width than
/// necessary (e.g. due to register coalescing with a "truncate" copy).
/// So, it handles pattern like this:
///
/// %bb.2: derived from LLVM BB %if.then
/// Live Ins: %rdi
/// Predecessors according to CFG: %bb.0
/// %ax = MOV16rm killed %rdi, 1, %noreg, 0, %noreg, implicit-def %eax;
/// mem:LD2[%p]
/// No implicit %eax
/// Successors according to CFG: %bb.3(?%)
///		///
/// %bb.3: derived from LLVM BB %if.end		/// If so, return that super register in \p SuperDestReg.
/// Live Ins: %eax Only %ax is actually live		bool FixupBWInstPass::getSuperRegDestIfDead(MachineInstr *OrigMI,
/// Predecessors according to CFG: %bb.2 %bb.1		unsigned &SuperDestReg) const {
/// %ax = KILL %ax, implicit killed %eax		auto *TRI = &TII->getRegisterInfo();
/// RET 0, %ax
static bool isLive(const MachineInstr &MI,		unsigned OrigDestReg = OrigMI->getOperand(0).getReg();
const LivePhysRegs &LiveRegs,		SuperDestReg = getX86SubSuperRegister(OrigDestReg, 32);
const TargetRegisterInfo *TRI,
unsigned Reg) {		const auto SubRegIdx = TRI->getSubRegIndex(SuperDestReg, OrigDestReg);
if (!LiveRegs.contains(Reg))
		// Make sure that the sub-register that this instruction has as its
		// destination is the lowest order sub-register of the super-register.
		// If it isn't, then the register isn't really dead even if the
		// super-register is considered dead.
		if (SubRegIdx == X86::sub_8bit_hi)
return false;		return false;

unsigned Opc = MI.getOpcode(); (void)Opc;		// If neither the destination-super register nor any applicable subregisters
		// are live after this instruction, then the super register is safe to use.
		if (!LiveRegs.contains(SuperDestReg)) {
		// If the original destination register was not the low 8-bit subregister
		// then the super register check is sufficient.
		if (SubRegIdx != X86::sub_8bit)
		return true;
		// If the original destination register was the low 8-bit subregister and
		// we also need to check the 16-bit subregister and the high 8-bit
		// subregister.
		if (!LiveRegs.contains(getX86SubSuperRegister(OrigDestReg, 16)) &&
		!LiveRegs.contains(getX86SubSuperRegister(SuperDestReg, 8,
		/High=/true)))
		return true;
		// Otherwise, we have a little more checking to do.
		}

		// If we get here, the super-register destination (or some part of it) is
		// marked as live after the original instruction.
		//
		// The X86 backend does not have subregister liveness tracking enabled,
		// so liveness information might be overly conservative. Specifically, the
		// super register might be marked as live because it is implicitly defined
		// by the instruction we are examining.
		//
		// However, for some specific instructions (this pass only cares about MOVs)
		// we can produce more precise results by analysing that MOV's operands.
		//
		// Indeed, if super-register is not live before the mov it means that it
		// was originally <read-undef> and so we are free to modify these
		// undef upper bits. That may happen in case where the use is in another MBB
		// and the vreg/physreg corresponding to the move has higher width than
		// necessary (e.g. due to register coalescing with a "truncate" copy).
		// So, we would like to handle patterns like this:
		//
		// %bb.2: derived from LLVM BB %if.then
		// Live Ins: %rdi
		// Predecessors according to CFG: %bb.0
		// %ax<def> = MOV16rm killed %rdi, 1, %noreg, 0, %noreg, implicit-def %eax
		// ; No implicit %eax
		// Successors according to CFG: %bb.3(?%)
		//
		// %bb.3: derived from LLVM BB %if.end
		// Live Ins: %eax Only %ax is actually live
		// Predecessors according to CFG: %bb.2 %bb.1
		// %ax = KILL %ax, implicit killed %eax
		// RET 0, %ax
		unsigned Opc = OrigMI->getOpcode(); (void)Opc;
// These are the opcodes currently handled by the pass, if something		// These are the opcodes currently handled by the pass, if something
// else will be added we need to ensure that new opcode has the same		// else will be added we need to ensure that new opcode has the same
// properties.		// properties.
assert((Opc == X86::MOV8rm \|\| Opc == X86::MOV16rm \|\| Opc == X86::MOV8rr \|\|		assert((Opc == X86::MOV8rm \|\| Opc == X86::MOV16rm \|\| Opc == X86::MOV8rr \|\|
Opc == X86::MOV16rr) &&		Opc == X86::MOV16rr) &&
"Unexpected opcode.");		"Unexpected opcode.");

bool IsDefined = false;		bool IsDefined = false;
for (auto &MO: MI.implicit_operands()) {		for (auto &MO: OrigMI->implicit_operands()) {
if (!MO.isReg())		if (!MO.isReg())
continue;		continue;

assert((MO.isDef() \|\| MO.isUse()) && "Expected Def or Use only!");		assert((MO.isDef() \|\| MO.isUse()) && "Expected Def or Use only!");

for (MCSuperRegIterator Supers(Reg, TRI, true); Supers.isValid(); ++Supers) {		for (MCSuperRegIterator Supers(OrigDestReg, TRI, true); Supers.isValid();
		++Supers) {
if (*Supers == MO.getReg()) {		if (*Supers == MO.getReg()) {
if (MO.isDef())		if (MO.isDef())
IsDefined = true;		IsDefined = true;
else		else
return true; // SuperReg Imp-used' -> live before the MI		return false; // SuperReg Imp-used' -> live before the MI
}		}
}		}
}		}
// Reg is not Imp-def'ed -> it's live both before/after the instruction.		// Reg is not Imp-def'ed -> it's live both before/after the instruction.
if (!IsDefined)		if (!IsDefined)
return true;		return false;

// Otherwise, the Reg is not live before the MI and the MOV can't		// Otherwise, the Reg is not live before the MI and the MOV can't
// make it really live, so it's in fact dead even after the MI.		// make it really live, so it's in fact dead even after the MI.
return false;
}

/// \brief Check if after \p OrigMI the only portion of super register
/// of the destination register of \p OrigMI that is alive is that
/// destination register.
///
/// If so, return that super register in \p SuperDestReg.
bool FixupBWInstPass::getSuperRegDestIfDead(MachineInstr *OrigMI,
unsigned &SuperDestReg) const {
auto *TRI = &TII->getRegisterInfo();

unsigned OrigDestReg = OrigMI->getOperand(0).getReg();
SuperDestReg = getX86SubSuperRegister(OrigDestReg, 32);

const auto SubRegIdx = TRI->getSubRegIndex(SuperDestReg, OrigDestReg);

// Make sure that the sub-register that this instruction has as its
// destination is the lowest order sub-register of the super-register.
// If it isn't, then the register isn't really dead even if the
// super-register is considered dead.
if (SubRegIdx == X86::sub_8bit_hi)
return false;

if (isLive(*OrigMI, LiveRegs, TRI, SuperDestReg))
return false;

if (SubRegIdx == X86::sub_8bit) {
// In the case of byte registers, we also have to check that the upper
// byte register is also dead. That is considered to be independent of
// whether the super-register is dead.
unsigned UpperByteReg =
getX86SubSuperRegister(SuperDestReg, 8, /High=/true);

if (isLive(*OrigMI, LiveRegs, TRI, UpperByteReg))
return false;
}

return true;		return true;
}		}

MachineInstr *FixupBWInstPass::tryReplaceLoad(unsigned New32BitOpcode,		MachineInstr *FixupBWInstPass::tryReplaceLoad(unsigned New32BitOpcode,
MachineInstr *MI) const {		MachineInstr *MI) const {
unsigned NewDestReg;		unsigned NewDestReg;

// We are going to try to rewrite this load to a larger zero-extending		// We are going to try to rewrite this load to a larger zero-extending
▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/fixup-bw-inst.mir

Show All 20 Lines	if.then: ; preds = %entry
%0 = load i16, i16* %p, align 2		%0 = load i16, i16* %p, align 2
br label %if.end		br label %if.end

if.end: ; preds = %if.then, %entry		if.end: ; preds = %if.then, %entry
%i.0 = phi i16 [ %0, %if.then ], [ 0, %entry ]		%i.0 = phi i16 [ %0, %if.then ], [ 0, %entry ]
ret i16 %i.0		ret i16 %i.0
}		}

		define i16 @test4() {
		entry:
		%t1 = zext i1 undef to i16
		%t2 = or i16 undef, %t1
		ret i16 %t2
		}
...		...
---		---
# CHECK-LABEL: name: test1		# CHECK-LABEL: name: test1
name: test1		name: test1
alignment: 4		alignment: 4
exposesReturnsTwice: false		exposesReturnsTwice: false
legalized: false		legalized: false
regBankSelected: false		regBankSelected: false
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	bb.2.if.then:
RETQ %ax		RETQ %ax

bb.1:		bb.1:
%eax = XOR32rr undef %eax, undef %eax, implicit-def dead %eflags		%eax = XOR32rr undef %eax, undef %eax, implicit-def dead %eflags
%ax = KILL %ax, implicit killed %eax		%ax = KILL %ax, implicit killed %eax
RETQ %ax		RETQ %ax

...		...
		---
		# CHECK-LABEL: name: test4
		name: test4
		alignment: 4
		exposesReturnsTwice: false
		legalized: false
		regBankSelected: false
		selected: false
		tracksRegLiveness: true
		registers:
		liveins:
		- { reg: '%r9d' }
		frameInfo:
		isFrameAddressTaken: false
		isReturnAddressTaken: false
		hasStackMap: false
		hasPatchPoint: false
		stackSize: 0
		offsetAdjustment: 0
		maxAlignment: 0
		adjustsStack: false
		hasCalls: false
		stackProtector: ''
		maxCallFrameSize: 0
		hasOpaqueSPAdjustment: false
		hasVAStart: false
		hasMustTailInVarArgFunc: false
		savePoint: ''
		restorePoint: ''
		fixedStack:
		stack:
		constants:
		# This code copies r10b into r9b and then uses r9w. We would like to promote
		# the copy to a 32-bit copy, but because r9w is used this is not acceptable.
		body: \|
		bb.0.entry:
		successors:
		liveins: %r9d

		%r9b = MOV8rr undef %r10b, implicit-def %r9d, implicit killed %r9d, implicit-def %eflags
		; CHECK-NOT: MOV32rr
		%ax = OR16rr undef %ax, %r9w, implicit-def %eflags
		RETQ %ax
		...

This is an archive of the discontinued LLVM Phabricator instance.

Handle the case of live 16-bit subregisters in X86FixupBWInsts
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 128452

llvm/trunk/lib/Target/X86/X86FixupBWInsts.cpp

llvm/trunk/test/CodeGen/X86/fixup-bw-inst.mir

This is an archive of the discontinued LLVM Phabricator instance.

Handle the case of live 16-bit subregisters in X86FixupBWInstsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 128452

llvm/trunk/lib/Target/X86/X86FixupBWInsts.cpp

llvm/trunk/test/CodeGen/X86/fixup-bw-inst.mir

Handle the case of live 16-bit subregisters in X86FixupBWInsts
ClosedPublic