This is an archive of the discontinued LLVM Phabricator instance.

x86: Emit LAHF/SAHF instead of PUSHF/POPF
ClosedPublic

Authored by jfb on Dec 11 2014, 2:58 PM.

Details

Summary

NaCl's sandbox doesn't allow PUSHF/POPF out of security concerns (priviledged emulators have forgotten to mask system bits in the past, and EFLAGS's DF bit is a constant source of hilarity). Commit r220529 fixed PR20376 by saving cmpxchg's flags result using EFLAGS, this commit now generated LAHF/SAHF instead, for all of x86 (not just NaCl) because it leads to an overall performance gain over PUSHF/POPF.

As with the previous patch this code generation is pretty bad because it occurs very later, after register allocation, and in many cases it rematerializes flags which were already available (e.g. already in a register through SETE). Fortunately it's somewhat rare that this code needs to fire.

I did a bit of benchmarking, the results on an Intel Haswell E5-2690 CPU at 2.9GHz are:

Time per call (ms)Runtime (ms)Benchmark
0.0000125146257sete.i386
0.0000128106405sete.i386-fast
0.0000104565228sete.x86-64
0.0000104965248sete.x86-64-fast
0.0000129066453lahf-sahf.i386
0.0000132366618lahf-sahf.i386-fast
0.0000105805290lahf-sahf.x86-64
0.0000103045152lahf-sahf.x86-64-fast
0.00002805614028pushf-popf.i386
0.00002716013580pushf-popf.i386-fast
0.00002381011905pushf-popf.x86-64
0.00002646813234pushf-popf.x86-64-fast

Clearly PUSHF/POPF are suboptimal. It doesn't really seems to be worth teaching LLVM about individual flags, at least not for this purpose.

Diff Detail

Event Timeline

jfb updated this revision to Diff 17195.Dec 11 2014, 2:58 PM
jfb retitled this revision from to x86 NaCl: Emit LAHF/SAHF instead of PUSHF/POPF.
jfb updated this object.
jfb edited the test plan for this revision. (Show Details)
jfb added reviewers: t.p.northover, jvoung.
jfb added a subscriber: Unknown Object (MLST).
t.p.northover added inline comments.Dec 12 2014, 7:17 AM
lib/Target/X86/X86InstrInfo.cpp
3284–3285

I don't think this works. This expansion occurs post-RA doesn't it? In which case AX could be live.

jfb added inline comments.Dec 12 2014, 8:46 AM
lib/Target/X86/X86InstrInfo.cpp
3284–3285

Correct, that's why AX is pushed/popped before and after lahf/sahf.

rnk added a subscriber: rnk.Dec 12 2014, 10:33 AM

LLVM should never, ever generate popf from normal code. It's ridiculously slow. We should use sahf/lahf if we can.

jfb added a comment.Dec 12 2014, 10:38 AM

If you think it's useful I can make the code unconditional w.r.t. NaCl, and
make sahf/lahf the only thing generated here. I don't think performance is
a real argument here, though, since this code path shouldn't get exercised
often. If we really cared about performance then we wouldn't hit this path
in the first place!

In my second attempt to make an intelligent comment here...

Doesn't this neglect the overflow flag?

lib/Target/X86/X86InstrInfo.cpp
3284–3285

Oh bother, don't know how I missed that. Sorry.

jfb updated this revision to Diff 17245.Dec 12 2014, 1:53 PM
jfb edited edge metadata.

Save OF too.

jfb added a comment.Dec 12 2014, 1:54 PM

In my second attempt to make an intelligent comment here...

Doesn't this neglect the overflow flag?

Eek you're totally right... I updated the patch, and if you thought the first one was horrible you'll find this one even better!

OK, I'm afraid I have a few more questions:

  • What happens for "EAX = COPY EFLAGS"? (It's not good).
  • I'm not sure what's available here, but there are ways to check liveness too, which would allow the push/pop to be skipped entirely if AX is dead.
  • We probably want to be more careful with the flags on the instructions (at the least LAHF & SAHF should be marked with EFLAGS I think, probably we should get the kill states right too).
  • With all this complexity, it might be time for a helper function.

And while we're at it, I think it'd be good to have a single code sequence too. Unless this dance turns out to be slower than pushf/popf (not impossible, even if those are slow).

jfb added a comment.Dec 15 2014, 3:41 PM

OK, I'm afraid I have a few more questions:

  • What happens for "EAX = COPY EFLAGS"? (It's not good).

You mean if the user actually wants to copy EFLAGS for system code? Indeed that would be broken. Currently the test's code for test_intervening_call sees the following after ISel lowering:

*** MachineFunction at end of ISel ***
# Machine code for function test_intervening_call: SSA
Function Live Ins: %EDI in %vreg0, %RSI in %vreg1, %RDX in %vreg2

BB#0: derived from LLVM BB %0
    Live Ins: %EDI %RSI %RDX
	%vreg2<def> = COPY %RDX; GR64:%vreg2
	%vreg1<def> = COPY %RSI; GR64:%vreg1
	%vreg0<def> = COPY %EDI; GR32:%vreg0
	%RAX<def> = COPY %vreg1; GR64:%vreg1
	LCMPXCHG64 %vreg0, 1, %noreg, 0, %noreg, %vreg2, %RAX<imp-def>, %EFLAGS<imp-def>, %RAX<imp-use>; mem:Volatile LDST8[%foo] GR32:%vreg0 GR64:%vreg2
	%vreg3<def> = COPY %RAX; GR64:%vreg3
	%vreg4<def> = COPY %EFLAGS; GR64:%vreg4
	ADJCALLSTACKDOWN32 0, %ESP<imp-def,dead>, %EFLAGS<imp-def,dead>, %ESP<imp-use>
	CALLpcrel32 <ga:@bar>, <regmask>, %ESP<imp-use>, %ESP<imp-def>, %EAX<imp-def>
	ADJCALLSTACKUP32 0, 0, %ESP<imp-def,dead>, %EFLAGS<imp-def,dead>, %ESP<imp-use>
	%vreg5<def> = COPY %EAX; GR32:%vreg5
	%EFLAGS<def> = COPY %vreg4; GR64:%vreg4
	JNE_4 <BB#2>, %EFLAGS<imp-use>
	JMP_4 <BB#1>
    Successors according to CFG: BB#1(16) BB#2(16)

AFAICT the code I wrote doesn't have a way to distinguish between the user asking for a true copy of EFLAGS, and ISel deciding that it needed to save/restore flags and deciding that EFLAGS was the way to go. This isn't an issue for NaCl because there isn't such a thing as copying EFLAGS. For system-mode LLVM trying to compile e.g. Linux that could be an issue, though I assume that this is done with inline assembly and not IR. Maybe ISel lowering should be changed? What do you think?

  • I'm not sure what's available here, but there are ways to check liveness too, which would allow the push/pop to be skipped entirely if AX is dead.

I'm not very familiar with this code, guidance appreciated. That looks like something PeepholeOptimizer.cpp should do (like it does with X86InstrInfo::optimizeCompareInstr and similar), but again ISel doesn't give it quite the information it needs to figure it out: we'd need to call out individual flags that are live instead, and for LAHF/SAHF + SETO for the general case, and depending on the subset of flags live otherwise we could do a simple SET<cc>. GCC manages to do this, and the repro that I posted on the original PR is much cleaner in GCC's case because is realizes that the flag it care about has already been materialized.

Do we care to do this, though? This code shouldn't get emitted often, so is it worth optimizing/testing/maintaining?

  • We probably want to be more careful with the flags on the instructions (at the least LAHF & SAHF should be marked with EFLAGS I think, probably we should get the kill states right too).

I think that's already the case, unless I misunderstand what you want. See test_intervening_call after post-RA expansion (when LAHF/SAHF get added):

# *** IR Dump After Post-RA pseudo instruction expansion pass ***:
# Machine code for function test_intervening_call: Post SSA
Frame Objects:
  fi#-1: size=8, align=16, fixed, at location [SP-8]
Function Live Ins: %EDI in %vreg0, %RSI in %vreg1, %RDX in %vreg2

BB#0: derived from LLVM BB %0
    Live Ins: %EDI %RSI %RDX %RBX
	PUSH64r %RBX<kill>, %RSP<imp-def>, %RSP<imp-use>; flags: FrameSetup
	CFI_INSTRUCTION <call frame instruction>
	CFI_INSTRUCTION <call frame instruction>
	%RAX<def> = MOV64rr %RSI<kill>
	LCMPXCHG64 %EDI<kill>, 1, %noreg, 0, %noreg, %RDX<kill>, %RAX<imp-def,dead>, %EFLAGS<imp-def>, %RAX<imp-use,kill>; mem:Volatile LDST8[%foo]
	PUSH64r %RAX, %RSP<imp-def>, %RSP<imp-use>
	%RAX<def> = SETOr %EFLAGS<imp-use>
	LAHF %AH<imp-def>, %EFLAGS<imp-use>
	%RBX<def> = MOV64rr %RAX
	%RAX<def> = POP64r %RSP<imp-def>, %RSP<imp-use>
	CALLpcrel32 <ga:@bar>, <regmask>, %ESP<imp-use>, %ESP<imp-def>, %EAX<imp-def,dead>
	PUSH64r %RAX, %RSP<imp-def>, %RSP<imp-use>
	%RAX<def> = MOV64rr %RBX<kill>
	%AL<def,tied1> = ADD8ri %AL<tied0>, 127, %EFLAGS<imp-def>
	SAHF %EFLAGS<imp-def>, %AH<imp-use>
	%RAX<def> = POP64r %RSP<imp-def>, %RSP<imp-use>
	JNE_4 <BB#2>, %EFLAGS<imp-use>
    Successors according to CFG: BB#1(16) BB#2(16)
  • With all this complexity, it might be time for a helper function.

You mean: I should pull the EFLAGS handling out of X86InstrInfo::copyPhysReg? I can do that, I just want to make sure that's what you're suggesting.

And while we're at it, I think it'd be good to have a single code sequence too. Unless this dance turns out to be slower than pushf/popf (not impossible, even if those are slow).

That'll depend on the answer to your above question: does this break system code that builds with LLVM? I don't think it does (since it probably uses inline assembly). If that's the case then I can craft a benchmark and see what the cost is, which should give us the data needed to decide what to do.

jvoung added inline comments.Dec 15 2014, 5:24 PM
test/CodeGen/X86/cmpxchg-clobber-flags.ll
5

IIRC adding "-verify-machineinstr" to these test commandline flags can help check that the liveness info is correct, so it might help verify that you have the right kill states.

jfb added inline comments.Dec 16 2014, 11:43 AM
test/CodeGen/X86/cmpxchg-clobber-flags.ll
5

This found issues, I sent a separate patch for some of them: D6687.

jfb updated this revision to Diff 17356.Dec 16 2014, 12:44 PM

Rebase to grab cahnges from D6629, including -verify-machineinstrs. I still need to fix the new tests, which now fail verification.

jfb updated this revision to Diff 17481.Dec 18 2014, 8:13 PM

Check liveness of AX before push/pop.
SETcc takes 8-bit register AL.

jfb added a comment.Dec 18 2014, 8:19 PM

Following up with @t.p.northover's earlier comments:

  • What happens for "EAX = COPY EFLAGS"? (It's not good).

You mean if the user actually wants to copy EFLAGS for system code? Indeed that would be broken.

No, I mean if LLVM happened to decide to spill EFLAGS to EAX (allocate
vreg4 to EAX in your example). The final pop would clobber the value
we'd just carefully constructed.

  • I'm not sure what's available here, but there are ways to check liveness too, which would allow the push/pop to be skipped entirely if AX is dead.

I'm not very familiar with this code, guidance appreciated.

I meant MachineBasicBlock::computeRegisterLiveness. If it tells us AX
might be live, save it, otherwise skip it.

Done in my latest update, though it looks like this is exposing a bug. I think cmpxchg is correctly annotated as def+kill of EAX, but the liveness analysis seems to think it's dead:

BB#0: derived from LLVM BB %0
    Live Ins: %EBX %ESI
	PUSH32r %EBX<kill>, %ESP<imp-def>, %ESP<imp-use>; flags: FrameSetup
	CFI_INSTRUCTION <call frame instruction>
	PUSH32r %ESI<kill>, %ESP<imp-def>, %ESP<imp-use>; flags: FrameSetup
	CFI_INSTRUCTION <call frame instruction>
	CFI_INSTRUCTION <call frame instruction>
	CFI_INSTRUCTION <call frame instruction>
	%EAX<def> = MOV32rm %ESP, 1, %noreg, 16, %noreg; mem:LD4[FixedStack-2]
	%EDX<def> = MOV32rm %ESP, 1, %noreg, 20, %noreg; mem:LD4[FixedStack-3]
	%EBX<def> = MOV32rm %ESP, 1, %noreg, 24, %noreg; mem:LD4[FixedStack-4]
	%ECX<def> = MOV32rm %ESP, 1, %noreg, 28, %noreg; mem:LD4[FixedStack-5]
	%ESI<def> = MOV32rm %ESP, 1, %noreg, 12, %noreg; mem:LD4[FixedStack-1]
	LCMPXCHG8B %ESI<kill>, 1, %noreg, 0, %noreg, %EAX<imp-def,dead>, %EDX<imp-def,dead>, %EFLAGS<imp-def>, %EAX<imp-use>, %EBX<imp-use>, %ECX<imp-use>, %EDX<imp-use>; mem:Volatile LDST8[%foo]
	%AL<def> = SETOr %EFLAGS<imp-use>
	LAHF %AH<imp-def>, %EFLAGS<imp-use>
	%ESI<def> = MOV32rr %EAX
	CALLpcrel32 <ga:@bar>, <regmask>, %ESP<imp-use>, %ESP<imp-def>, %EAX<imp-def,dead>
	%EAX<def> = MOV32rr %ESI<kill>
	%AL<def,tied1> = ADD8ri %AL<tied0>, 127, %EFLAGS<imp-def>
	SAHF %EFLAGS<imp-def>, %AH<imp-use>
	JNE_4 <BB#3>, %EFLAGS<imp-use>
    Successors according to CFG: BB#1(16) BB#3(16)
  • We probably want to be more careful with the flags on the instructions (at the least LAHF & SAHF should be marked with EFLAGS I think, probably we should get the kill states right too).

I think that's already the case, unless I misunderstand what you want. See test_intervening_call after post-RA expansion (when LAHF/SAHF get added):

Oh good, it obviously comes from the MCInstrDesc somehow. The flags
aren't quite correct though:

I updated push to kill AX. I'm not sure which other maintenance should be done on the liveness state.

  • With all this complexity, it might be time for a helper function.

You mean: I should pull the EFLAGS handling out of X86InstrInfo::copyPhysReg? I can do that, I just want to make sure that's what you're suggesting.

Yep, that's what I meant.

Haven't done that yet, just to keep the diff easier to follow.

jfb updated this revision to Diff 17482.Dec 18 2014, 8:32 PM

Update test to properly eliminate dead AX push/pop. My previous comment was wrong: AX was indeed defined but subsequently unused. The test now exercise live AX before and after the call, and everything looks good.

jfb added a comment.Dec 18 2014, 8:35 PM

Alright, here's what's left to figure out:

  • Is liveness maintenance correct?
  • Should I handle individual flags more precisely? That means I could emit as little as a single SETcc, or just LAHF/SAHF and omit SETO.
  • Should I make the NaCl path the only one (e.g. no PUSHF/POPF)?
  • Refactor the code, if pulling out a function makes sense.
jfb updated this revision to Diff 17497.Dec 19 2014, 9:02 AM

CALL is now preceded by MOV since @bar takes arguments.

Re-running the NaCl tests, only i386 works. The -pre-RA-sched=fast version fails, and so do both x86-64 versions, because the calling convention and register allocator collude to change AX's liveness from one version to another. I could have 2 CHECK tests for i386, and 1 for x86-64.

jfb added a comment.Dec 19 2014, 12:10 PM

I did a bit of benchmarking, the results on an Intel Haswell E5-2690 CPU at 2.9GHz are:

Time per call (ms)Runtime (ms)Benchmark
0.0000125146257sete.i386
0.0000128106405sete.i386-fast
0.0000104565228sete.x86-64
0.0000104965248sete.x86-64-fast
0.0000129066453lahf-sahf.i386
0.0000132366618lahf-sahf.i386-fast
0.0000105805290lahf-sahf.x86-64
0.0000103045152lahf-sahf.x86-64-fast
0.00002805614028pushf-popf.i386
0.00002716013580pushf-popf.i386-fast
0.00002381011905pushf-popf.x86-64
0.00002646813234pushf-popf.x86-64-fast

Clearly PUSHF/POPF are suboptimal, I'll therefore delete that code and only keep the NaCl code (and make it non-NaCl specific). I'll update the tests to show the differences in register allocation. It doesn't really seems to be worth teaching LLVM about individual flags, at least not for this purpose.

There also seems to be a bug in x86-64 when generating code for test_feed_cmov.

jfb updated this revision to Diff 17522.Dec 19 2014, 2:56 PM

Remove the NaCl specificity, and never emit PUSHF/POPF. Update the test accordingly.

There now seems to be a bug with the code generation in x86-64 for test_feed_cmov. The following gets generated:

test_feed_cmov:
	pushq	%r14
	pushq	%rbx
	pushq	%rax
	movl	%edx, %ebx
	movl	%esi, %eax
	lock
	cmpxchgl	%ebx, (%rdi)
	seto	%al
	lahf
	movq	%rax, %r14
	callq	foo
	movq	%r14, %rax
	addb	$127, %al
	sahf
	cmovel	%ebx, %eax
	addq	$8, %rsp
	popq	%rbx
	popq	%r14
	retq

The bitcode:

define i32 @test_feed_cmov(i32* %addr, i32 %desired, i32 %new) {
  %res = cmpxchg i32* %addr, i32 %desired, i32 %new seq_cst seq_cst
  %success = extractvalue { i32, i1 } %res, 1
  %rhs = call i32 @foo()
  %ret = select i1 %success, i32 %new, i32 %rhs
  ret i32 %ret
}

IIUC it's correctly returning %ebx (%new) on success but on failure it's returning %eax which is clobbered 3 times after callq sets it. I think CMOV's use/def isn't annotated properly:

	%EAX<def,tied1> = CMOVE32rr %EAX<kill,tied0>, %EBX<kill>, %EFLAGS<imp-use>

I'm trying to figure out the dark magic that tablegen conjured into X86InstrCMovSetCC.td, and which incantation would nullify this powerful spell.

jfb added a comment.Dec 19 2014, 3:21 PM

At this point I think the issue is in how liveness is tagged onto CMOV, but I'm not sure I understand how that happens from the tablegen files. Post-RA pseudo instruction expansion pass transforms the following:

	CALL64pcrel32 <ga:@foo>, <regmask>, %RSP<imp-use>, %RSP<imp-def>, %EAX<imp-def>
	%EFLAGS<def> = COPY %R14<kill>
	%EAX<def,tied1> = CMOVE32rr %EAX<kill,tied0>, %EBX<kill>, %EFLAGS<imp-use>
	%RSP<def,tied1> = ADD64ri8 %RSP<tied0>, 8, %EFLAGS<imp-def,dead>
	%RBX<def> = POP64r %RSP<imp-def>, %RSP<imp-use>
	%R14<def> = POP64r %RSP<imp-def>, %RSP<imp-use>
	RETQ %EAX

Into:

	CALL64pcrel32 <ga:@foo>, <regmask>, %RSP<imp-use>, %RSP<imp-def>, %EAX<imp-def>
	%RAX<def> = MOV64rr %R14<kill>
	%AL<def,tied1> = ADD8ri %AL<tied0>, 127, %EFLAGS<imp-def>
	SAHF %EFLAGS<imp-def>, %AH<imp-use>
	%EAX<def,tied1> = CMOVE32rr %EAX<kill,tied0>, %EBX<kill>, %EFLAGS<imp-use>
	%RSP<def,tied1> = ADD64ri8 %RSP<tied0>, 8, %EFLAGS<imp-def,dead>
	%RBX<def> = POP64r %RSP<imp-def>, %RSP<imp-use>
	%R14<def> = POP64r %RSP<imp-def>, %RSP<imp-use>
	RETQ %EAX

Note the lack of AX save/restore because liveness thinks it ins't live at this location.

@t.p.northover do you think that's indeed where the bug lies, or am I missing something?

Note the lack of AX save/restore because liveness thinks it ins't live at this location.

@t.p.northover do you think that's indeed where the bug lies, or am I missing something?

I think so. It looks like a miscommunication between
computeRegisterLiveness and analyzePhysReg.

The call gets analysed as clobbering AX but not defining it, which the
computation treats as "Dead". I'm really not sure how the output of
analyzePhysReg should be interpreted though. At first glance
(comparing against comments & documentation) it seems like it's
confusing super-regs with sub-regs, but I don't think even that would
fix it completely.

Cheers.

Tim.

jfb retitled this revision from x86 NaCl: Emit LAHF/SAHF instead of PUSHF/POPF to x86: Emit LAHF/SAHF instead of PUSHF/POPF.Jan 8 2015, 3:55 PM
jfb updated this object.
jfb updated this revision to Diff 31273.Aug 3 2015, 3:06 PM
jfb marked 5 inline comments as done.
jfb updated this object.
  • Fix MO's analyzePhysReg, it was confusing sub- and super-registers. Problem pointed out by Michael Hordijk.
jfb added a comment.Aug 3 2015, 3:10 PM

@rnk @t.p.northover @jvoung: this patch is now unblocked, and should be ready to commit if it looks good to you :-)
Thanks to Michael for tracking down the final issue, which only this test exercised.

jfb added a comment.Aug 5 2015, 1:52 PM

Ugh, I was trying to commit D11382 and committed this as r244120 instead, with an entirely unhelpful commit message :-s
Sorry about doing this, I'll revert it for now, and wait for proper signoff.

rnk added inline comments.Aug 5 2015, 2:06 PM
lib/Target/X86/X86InstrInfo.cpp
3271–3272

I think we can stengthen this to say that using POPF is incorrect, since it can accidentally reset things like TF and IF, and we don't want to do that.

3299

I'm concerned that we don't have the right kill states here and below. Unfortunately, I don't know enough to say what we should be doing. :(

jfb updated this revision to Diff 31406.Aug 5 2015, 3:43 PM
jfb marked an inline comment as done.
  • Comment on correctness.
lib/Target/X86/X86InstrInfo.cpp
3299

Sciencedog here: how would I test this? Push it to do the wrong thing?

rnk accepted this revision.Aug 10 2015, 1:34 PM
rnk added a reviewer: rnk.

lgtm

While I'm concerned that we aren't managing the kill flags right, this seems like a strict improvement of the situation.

This revision is now accepted and ready to land.Aug 10 2015, 1:34 PM
This revision was automatically updated to reflect the committed changes.
jfb added a comment.Aug 12 2015, 10:44 AM

There's a follow-up discussion about LAHF/SAHF not being supported by all x86 processors.

sanjoy added a subscriber: sanjoy.Oct 3 2015, 12:51 AM
sanjoy added inline comments.
llvm/trunk/lib/CodeGen/MachineInstrBundle.cpp
313 ↗(On Diff #31719)

Was this bit intentional? I think (without understanding the surrounding code) this is contributing to PR25033.

jfb added inline comments.Oct 4 2015, 11:46 AM
llvm/trunk/lib/CodeGen/MachineInstrBundle.cpp
313 ↗(On Diff #31719)

Yes, this change was intentional: the code used to get confused about sub/super reg before and my patch would tickle the issues (because it uses AL and AH separately. I believe that the previous code had no clue about subregs, and so other code was simply incorrect but that never manifested because of this bug. I think my fix may tickle other bugs :-)

Michael Hordijk tracked down the problem in the mailing list:

The IR:

        CALL64pcrel32 <ga:@foo>, <regmask>, %RSP<imp-use>, %RSP<imp-def>,
%EAX<imp-def>
        %RAX<def> = MOV64rr %R14<kill>

So CALL defines EAX, and we're asking computeRegisterLiveness whether or
not RAX is live. computeRegisterLiveness walks backwards and one thing it
looks for is whether the register is being defined:

if (IsRegOrSuperReg) {
    PRI.Defines = true;     // Reg or a super-register is defined.
    if (!MO.isDead())
    AllDefsDead = false;
}

And this is what determines if we (RAX) is a register or a super-register
of the register being defined (lib/CodeGen/MachineInstrBundle.cpp):

313:    bool IsRegOrSuperReg = MOReg == Reg || TRI->isSubRegister(MOReg,
Reg);

This will return false, as RAX is not a sub register of EAX.

sanjoy added inline comments.Oct 4 2015, 1:02 PM
llvm/trunk/lib/CodeGen/MachineInstrBundle.cpp
313 ↗(On Diff #31719)

I'm not familiar with the LLVM backend, so here is what I've *assumed*
about "clobber" vs. "define":

  • a register is clobbered if it is partially written to
  • a register is defined if it is fully written to either by a move to that register, or to some super-register that wholly contains the said register

If these assumptions are wrong, then you can ignore everything I've
said below. :)

With these assumptions in place, I read the code in analyzePhysReg this way:

IsRegOrSuperReg == true ==> if MOReg is written to (in the
containing MI), then Reg is defined. Specifically, if MOReg is
EAX and Reg is RAX then this is false, since even if the
instruction is defines EAX, it does not *define* RAX (though it
may still clobber it).

IsRegOrOverlapping == true ==> if MOReg is written to, Reg is
clobbered (i.e. partially written to). This should be true if MOReg
is EAX and Reg is RAX.

I think the initial problem (before this change, in the IR you showed) was not that
IsRegOrSuperReg was false when it should have been true, but that
in computeRegisterLiveness we have

if (Analysis.Kills || Analysis.Clobbers)
  // Register killed, so isn't live.
  return LQR_Dead;

I don't think we can return LQR_Dead if Analysis.Clobbers is true.