This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/
-
CodeGen/
3
MachineInstrBundle.cpp
-
Target/X86/
-
X86/
-
X86InstrInfo.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
cmpxchg-clobber-flags.ll

Differential D6629

x86: Emit LAHF/SAHF instead of PUSHF/POPF
ClosedPublic

Authored by jfb on Dec 11 2014, 2:58 PM.

Download Raw Diff

Details

Reviewers

t.p.northover
rnk
jvoung

Commits

rGfa9746dc8d86: x86: Emit LAHF/SAHF instead of PUSHF/POPF
rL244503: x86: Emit LAHF/SAHF instead of PUSHF/POPF

Summary

NaCl's sandbox doesn't allow PUSHF/POPF out of security concerns (priviledged emulators have forgotten to mask system bits in the past, and EFLAGS's DF bit is a constant source of hilarity). Commit r220529 fixed PR20376 by saving cmpxchg's flags result using EFLAGS, this commit now generated LAHF/SAHF instead, for all of x86 (not just NaCl) because it leads to an overall performance gain over PUSHF/POPF.

As with the previous patch this code generation is pretty bad because it occurs very later, after register allocation, and in many cases it rematerializes flags which were already available (e.g. already in a register through SETE). Fortunately it's somewhat rare that this code needs to fire.

I did a bit of benchmarking, the results on an Intel Haswell E5-2690 CPU at 2.9GHz are:

Time per call (ms)	Runtime (ms)	Benchmark
0.000012514	6257	sete.i386
0.000012810	6405	sete.i386-fast
0.000010456	5228	sete.x86-64
0.000010496	5248	sete.x86-64-fast
0.000012906	6453	lahf-sahf.i386
0.000013236	6618	lahf-sahf.i386-fast
0.000010580	5290	lahf-sahf.x86-64
0.000010304	5152	lahf-sahf.x86-64-fast
0.000028056	14028	pushf-popf.i386
0.000027160	13580	pushf-popf.i386-fast
0.000023810	11905	pushf-popf.x86-64
0.000026468	13234	pushf-popf.x86-64-fast

Clearly PUSHF/POPF are suboptimal. It doesn't really seems to be worth teaching LLVM about individual flags, at least not for this purpose.

Diff Detail

Repository: rL LLVM

Event Timeline

jfb updated this revision to Diff 17195.Dec 11 2014, 2:58 PM

jfb retitled this revision from to x86 NaCl: Emit LAHF/SAHF instead of PUSHF/POPF.

jfb updated this object.

jfb edited the test plan for this revision. (Show Details)

jfb added reviewers: t.p.northover, jvoung.

jfb added a subscriber: Unknown Object (MLST).

Herald added a subscriber: jfb. · View Herald TranscriptDec 11 2014, 2:58 PM

t.p.northover added inline comments.Dec 12 2014, 7:17 AM

lib/Target/X86/X86InstrInfo.cpp
3281–3282 ↗	(On Diff #17195)	I don't think this works. This expansion occurs post-RA doesn't it? In which case AX could be live.

jfb added inline comments.Dec 12 2014, 8:46 AM

lib/Target/X86/X86InstrInfo.cpp
3281–3282 ↗	(On Diff #17195)	Correct, that's why AX is pushed/popped before and after lahf/sahf.

LLVM should never, ever generate popf from normal code. It's ridiculously slow. We should use sahf/lahf if we can.

If you think it's useful I can make the code unconditional w.r.t. NaCl, and
make sahf/lahf the only thing generated here. I don't think performance is
a real argument here, though, since this code path shouldn't get exercised
often. If we really cared about performance then we wouldn't hit this path
in the first place!

In my second attempt to make an intelligent comment here...

Doesn't this neglect the overflow flag?

lib/Target/X86/X86InstrInfo.cpp
3281–3282 ↗	(On Diff #17195)	Oh bother, don't know how I missed that. Sorry.

Save OF too.

In D6629#101179, @t.p.northover wrote:

In my second attempt to make an intelligent comment here...

Doesn't this neglect the overflow flag?

Eek you're totally right... I updated the patch, and if you thought the first one was horrible you'll find this one even better!

OK, I'm afraid I have a few more questions:

What happens for "EAX = COPY EFLAGS"? (It's not good).
I'm not sure what's available here, but there are ways to check liveness too, which would allow the push/pop to be skipped entirely if AX is dead.
We probably want to be more careful with the flags on the instructions (at the least LAHF & SAHF should be marked with EFLAGS I think, probably we should get the kill states right too).
With all this complexity, it might be time for a helper function.

And while we're at it, I think it'd be good to have a single code sequence too. Unless this dance turns out to be slower than pushf/popf (not impossible, even if those are slow).

In D6629#101238, @t.p.northover wrote:

OK, I'm afraid I have a few more questions:

What happens for "EAX = COPY EFLAGS"? (It's not good).

You mean if the user actually wants to copy EFLAGS for system code? Indeed that would be broken. Currently the test's code for test_intervening_call sees the following after ISel lowering:

*** MachineFunction at end of ISel ***
# Machine code for function test_intervening_call: SSA
Function Live Ins: %EDI in %vreg0, %RSI in %vreg1, %RDX in %vreg2

BB#0: derived from LLVM BB %0
    Live Ins: %EDI %RSI %RDX
	%vreg2<def> = COPY %RDX; GR64:%vreg2
	%vreg1<def> = COPY %RSI; GR64:%vreg1
	%vreg0<def> = COPY %EDI; GR32:%vreg0
	%RAX<def> = COPY %vreg1; GR64:%vreg1
	LCMPXCHG64 %vreg0, 1, %noreg, 0, %noreg, %vreg2, %RAX<imp-def>, %EFLAGS<imp-def>, %RAX<imp-use>; mem:Volatile LDST8[%foo] GR32:%vreg0 GR64:%vreg2
	%vreg3<def> = COPY %RAX; GR64:%vreg3
	%vreg4<def> = COPY %EFLAGS; GR64:%vreg4
	ADJCALLSTACKDOWN32 0, %ESP<imp-def,dead>, %EFLAGS<imp-def,dead>, %ESP<imp-use>
	CALLpcrel32 <ga:@bar>, <regmask>, %ESP<imp-use>, %ESP<imp-def>, %EAX<imp-def>
	ADJCALLSTACKUP32 0, 0, %ESP<imp-def,dead>, %EFLAGS<imp-def,dead>, %ESP<imp-use>
	%vreg5<def> = COPY %EAX; GR32:%vreg5
	%EFLAGS<def> = COPY %vreg4; GR64:%vreg4
	JNE_4 <BB#2>, %EFLAGS<imp-use>
	JMP_4 <BB#1>
    Successors according to CFG: BB#1(16) BB#2(16)

AFAICT the code I wrote doesn't have a way to distinguish between the user asking for a true copy of EFLAGS, and ISel deciding that it needed to save/restore flags and deciding that EFLAGS was the way to go. This isn't an issue for NaCl because there isn't such a thing as copying EFLAGS. For system-mode LLVM trying to compile e.g. Linux that could be an issue, though I assume that this is done with inline assembly and not IR. Maybe ISel lowering should be changed? What do you think?

I'm not sure what's available here, but there are ways to check liveness too, which would allow the push/pop to be skipped entirely if AX is dead.

I'm not very familiar with this code, guidance appreciated. That looks like something PeepholeOptimizer.cpp should do (like it does with X86InstrInfo::optimizeCompareInstr and similar), but again ISel doesn't give it quite the information it needs to figure it out: we'd need to call out individual flags that are live instead, and for LAHF/SAHF + SETO for the general case, and depending on the subset of flags live otherwise we could do a simple SET<cc>. GCC manages to do this, and the repro that I posted on the original PR is much cleaner in GCC's case because is realizes that the flag it care about has already been materialized.

Do we care to do this, though? This code shouldn't get emitted often, so is it worth optimizing/testing/maintaining?

We probably want to be more careful with the flags on the instructions (at the least LAHF & SAHF should be marked with EFLAGS I think, probably we should get the kill states right too).

I think that's already the case, unless I misunderstand what you want. See test_intervening_call after post-RA expansion (when LAHF/SAHF get added):

# *** IR Dump After Post-RA pseudo instruction expansion pass ***:
# Machine code for function test_intervening_call: Post SSA
Frame Objects:
  fi#-1: size=8, align=16, fixed, at location [SP-8]
Function Live Ins: %EDI in %vreg0, %RSI in %vreg1, %RDX in %vreg2

BB#0: derived from LLVM BB %0
    Live Ins: %EDI %RSI %RDX %RBX
	PUSH64r %RBX<kill>, %RSP<imp-def>, %RSP<imp-use>; flags: FrameSetup
	CFI_INSTRUCTION <call frame instruction>
	CFI_INSTRUCTION <call frame instruction>
	%RAX<def> = MOV64rr %RSI<kill>
	LCMPXCHG64 %EDI<kill>, 1, %noreg, 0, %noreg, %RDX<kill>, %RAX<imp-def,dead>, %EFLAGS<imp-def>, %RAX<imp-use,kill>; mem:Volatile LDST8[%foo]
	PUSH64r %RAX, %RSP<imp-def>, %RSP<imp-use>
	%RAX<def> = SETOr %EFLAGS<imp-use>
	LAHF %AH<imp-def>, %EFLAGS<imp-use>
	%RBX<def> = MOV64rr %RAX
	%RAX<def> = POP64r %RSP<imp-def>, %RSP<imp-use>
	CALLpcrel32 <ga:@bar>, <regmask>, %ESP<imp-use>, %ESP<imp-def>, %EAX<imp-def,dead>
	PUSH64r %RAX, %RSP<imp-def>, %RSP<imp-use>
	%RAX<def> = MOV64rr %RBX<kill>
	%AL<def,tied1> = ADD8ri %AL<tied0>, 127, %EFLAGS<imp-def>
	SAHF %EFLAGS<imp-def>, %AH<imp-use>
	%RAX<def> = POP64r %RSP<imp-def>, %RSP<imp-use>
	JNE_4 <BB#2>, %EFLAGS<imp-use>
    Successors according to CFG: BB#1(16) BB#2(16)

With all this complexity, it might be time for a helper function.

You mean: I should pull the EFLAGS handling out of X86InstrInfo::copyPhysReg? I can do that, I just want to make sure that's what you're suggesting.

And while we're at it, I think it'd be good to have a single code sequence too. Unless this dance turns out to be slower than pushf/popf (not impossible, even if those are slow).

That'll depend on the answer to your above question: does this break system code that builds with LLVM? I don't think it does (since it probably uses inline assembly). If that's the case then I can craft a benchmark and see what the cost is, which should give us the data needed to decide what to do.

Hi,

jvoung added inline comments.Dec 15 2014, 5:24 PM

test/CodeGen/X86/cmpxchg-clobber-flags.ll
5 ↗	(On Diff #17245)	IIRC adding "-verify-machineinstr" to these test commandline flags can help check that the liveness info is correct, so it might help verify that you have the right kill states.

jfb mentioned this in D6687: x86-32: PUSHF/POPF use/def EFLAGS.Dec 16 2014, 11:41 AM

jfb added inline comments.Dec 16 2014, 11:43 AM

test/CodeGen/X86/cmpxchg-clobber-flags.ll
5 ↗	(On Diff #17245)	This found issues, I sent a separate patch for some of them: D6687.

jfb mentioned this in rL224359: x86-32: PUSHF/POPF use/def EFLAGS.Dec 16 2014, 12:16 PM

Rebase to grab cahnges from D6629, including -verify-machineinstrs. I still need to fix the new tests, which now fail verification.

Check liveness of AX before push/pop.
SETcc takes 8-bit register AL.

Following up with @t.p.northover's earlier comments:

In D6629#101719, @t.p.northover wrote:

What happens for "EAX = COPY EFLAGS"? (It's not good).

You mean if the user actually wants to copy EFLAGS for system code? Indeed that would be broken.

No, I mean if LLVM happened to decide to spill EFLAGS to EAX (allocate
vreg4 to EAX in your example). The final pop would clobber the value
we'd just carefully constructed.

I'm not sure what's available here, but there are ways to check liveness too, which would allow the push/pop to be skipped entirely if AX is dead.

I'm not very familiar with this code, guidance appreciated.

I meant MachineBasicBlock::computeRegisterLiveness. If it tells us AX
might be live, save it, otherwise skip it.

Done in my latest update, though it looks like this is exposing a bug. I think cmpxchg is correctly annotated as def+kill of EAX, but the liveness analysis seems to think it's dead:

BB#0: derived from LLVM BB %0
    Live Ins: %EBX %ESI
	PUSH32r %EBX<kill>, %ESP<imp-def>, %ESP<imp-use>; flags: FrameSetup
	CFI_INSTRUCTION <call frame instruction>
	PUSH32r %ESI<kill>, %ESP<imp-def>, %ESP<imp-use>; flags: FrameSetup
	CFI_INSTRUCTION <call frame instruction>
	CFI_INSTRUCTION <call frame instruction>
	CFI_INSTRUCTION <call frame instruction>
	%EAX<def> = MOV32rm %ESP, 1, %noreg, 16, %noreg; mem:LD4[FixedStack-2]
	%EDX<def> = MOV32rm %ESP, 1, %noreg, 20, %noreg; mem:LD4[FixedStack-3]
	%EBX<def> = MOV32rm %ESP, 1, %noreg, 24, %noreg; mem:LD4[FixedStack-4]
	%ECX<def> = MOV32rm %ESP, 1, %noreg, 28, %noreg; mem:LD4[FixedStack-5]
	%ESI<def> = MOV32rm %ESP, 1, %noreg, 12, %noreg; mem:LD4[FixedStack-1]
	LCMPXCHG8B %ESI<kill>, 1, %noreg, 0, %noreg, %EAX<imp-def,dead>, %EDX<imp-def,dead>, %EFLAGS<imp-def>, %EAX<imp-use>, %EBX<imp-use>, %ECX<imp-use>, %EDX<imp-use>; mem:Volatile LDST8[%foo]
	%AL<def> = SETOr %EFLAGS<imp-use>
	LAHF %AH<imp-def>, %EFLAGS<imp-use>
	%ESI<def> = MOV32rr %EAX
	CALLpcrel32 <ga:@bar>, <regmask>, %ESP<imp-use>, %ESP<imp-def>, %EAX<imp-def,dead>
	%EAX<def> = MOV32rr %ESI<kill>
	%AL<def,tied1> = ADD8ri %AL<tied0>, 127, %EFLAGS<imp-def>
	SAHF %EFLAGS<imp-def>, %AH<imp-use>
	JNE_4 <BB#3>, %EFLAGS<imp-use>
    Successors according to CFG: BB#1(16) BB#3(16)

We probably want to be more careful with the flags on the instructions (at the least LAHF & SAHF should be marked with EFLAGS I think, probably we should get the kill states right too).

I think that's already the case, unless I misunderstand what you want. See test_intervening_call after post-RA expansion (when LAHF/SAHF get added):

Oh good, it obviously comes from the MCInstrDesc somehow. The flags
aren't quite correct though:

I updated push to kill AX. I'm not sure which other maintenance should be done on the liveness state.

With all this complexity, it might be time for a helper function.

You mean: I should pull the EFLAGS handling out of X86InstrInfo::copyPhysReg? I can do that, I just want to make sure that's what you're suggesting.

Yep, that's what I meant.

Haven't done that yet, just to keep the diff easier to follow.

Update test to properly eliminate dead AX push/pop. My previous comment was wrong: AX was indeed defined but subsequently unused. The test now exercise live AX before and after the call, and everything looks good.

Alright, here's what's left to figure out:

Is liveness maintenance correct?
Should I handle individual flags more precisely? That means I could emit as little as a single SETcc, or just LAHF/SAHF and omit SETO.
Should I make the NaCl path the only one (e.g. no PUSHF/POPF)?
Refactor the code, if pulling out a function makes sense.

CALL is now preceded by MOV since @bar takes arguments.

Re-running the NaCl tests, only i386 works. The -pre-RA-sched=fast version fails, and so do both x86-64 versions, because the calling convention and register allocator collude to change AX's liveness from one version to another. I could have 2 CHECK tests for i386, and 1 for x86-64.

I did a bit of benchmarking, the results on an Intel Haswell E5-2690 CPU at 2.9GHz are:

Time per call (ms)	Runtime (ms)	Benchmark
0.000012514	6257	sete.i386
0.000012810	6405	sete.i386-fast
0.000010456	5228	sete.x86-64
0.000010496	5248	sete.x86-64-fast
0.000012906	6453	lahf-sahf.i386
0.000013236	6618	lahf-sahf.i386-fast
0.000010580	5290	lahf-sahf.x86-64
0.000010304	5152	lahf-sahf.x86-64-fast
0.000028056	14028	pushf-popf.i386
0.000027160	13580	pushf-popf.i386-fast
0.000023810	11905	pushf-popf.x86-64
0.000026468	13234	pushf-popf.x86-64-fast

Clearly PUSHF/POPF are suboptimal, I'll therefore delete that code and only keep the NaCl code (and make it non-NaCl specific). I'll update the tests to show the differences in register allocation. It doesn't really seems to be worth teaching LLVM about individual flags, at least not for this purpose.

There also seems to be a bug in x86-64 when generating code for test_feed_cmov.

Remove the NaCl specificity, and never emit PUSHF/POPF. Update the test accordingly.

There now seems to be a bug with the code generation in x86-64 for test_feed_cmov. The following gets generated:

test_feed_cmov:
	pushq	%r14
	pushq	%rbx
	pushq	%rax
	movl	%edx, %ebx
	movl	%esi, %eax
	lock
	cmpxchgl	%ebx, (%rdi)
	seto	%al
	lahf
	movq	%rax, %r14
	callq	foo
	movq	%r14, %rax
	addb	$127, %al
	sahf
	cmovel	%ebx, %eax
	addq	$8, %rsp
	popq	%rbx
	popq	%r14
	retq

The bitcode:

define i32 @test_feed_cmov(i32* %addr, i32 %desired, i32 %new) {
  %res = cmpxchg i32* %addr, i32 %desired, i32 %new seq_cst seq_cst
  %success = extractvalue { i32, i1 } %res, 1
  %rhs = call i32 @foo()
  %ret = select i1 %success, i32 %new, i32 %rhs
  ret i32 %ret
}

IIUC it's correctly returning %ebx (%new) on success but on failure it's returning %eax which is clobbered 3 times after callq sets it. I think CMOV's use/def isn't annotated properly:

	%EAX<def,tied1> = CMOVE32rr %EAX<kill,tied0>, %EBX<kill>, %EFLAGS<imp-use>

I'm trying to figure out the dark magic that tablegen conjured into X86InstrCMovSetCC.td, and which incantation would nullify this powerful spell.

At this point I think the issue is in how liveness is tagged onto CMOV, but I'm not sure I understand how that happens from the tablegen files. Post-RA pseudo instruction expansion pass transforms the following:

	CALL64pcrel32 <ga:@foo>, <regmask>, %RSP<imp-use>, %RSP<imp-def>, %EAX<imp-def>
	%EFLAGS<def> = COPY %R14<kill>
	%EAX<def,tied1> = CMOVE32rr %EAX<kill,tied0>, %EBX<kill>, %EFLAGS<imp-use>
	%RSP<def,tied1> = ADD64ri8 %RSP<tied0>, 8, %EFLAGS<imp-def,dead>
	%RBX<def> = POP64r %RSP<imp-def>, %RSP<imp-use>
	%R14<def> = POP64r %RSP<imp-def>, %RSP<imp-use>
	RETQ %EAX

Into:

	CALL64pcrel32 <ga:@foo>, <regmask>, %RSP<imp-use>, %RSP<imp-def>, %EAX<imp-def>
	%RAX<def> = MOV64rr %R14<kill>
	%AL<def,tied1> = ADD8ri %AL<tied0>, 127, %EFLAGS<imp-def>
	SAHF %EFLAGS<imp-def>, %AH<imp-use>
	%EAX<def,tied1> = CMOVE32rr %EAX<kill,tied0>, %EBX<kill>, %EFLAGS<imp-use>
	%RSP<def,tied1> = ADD64ri8 %RSP<tied0>, 8, %EFLAGS<imp-def,dead>
	%RBX<def> = POP64r %RSP<imp-def>, %RSP<imp-use>
	%R14<def> = POP64r %RSP<imp-def>, %RSP<imp-use>
	RETQ %EAX

Note the lack of AX save/restore because liveness thinks it ins't live at this location.

@t.p.northover do you think that's indeed where the bug lies, or am I missing something?

Note the lack of AX save/restore because liveness thinks it ins't live at this location.

@t.p.northover do you think that's indeed where the bug lies, or am I missing something?

I think so. It looks like a miscommunication between
computeRegisterLiveness and analyzePhysReg.

The call gets analysed as clobbering AX but not defining it, which the
computation treats as "Dead". I'm really not sure how the output of
analyzePhysReg should be interpreted though. At first glance
(comparing against comments & documentation) it seems like it's
confusing super-regs with sub-regs, but I don't think even that would
fix it completely.

Cheers.

Tim.

jfb retitled this revision from x86 NaCl: Emit LAHF/SAHF instead of PUSHF/POPF to x86: Emit LAHF/SAHF instead of PUSHF/POPF.Jan 8 2015, 3:55 PM

jfb updated this object.

Fix MO's analyzePhysReg, it was confusing sub- and super-registers. Problem pointed out by Michael Hordijk.

@rnk @t.p.northover @jvoung: this patch is now unblocked, and should be ready to commit if it looks good to you :-)
Thanks to Michael for tracking down the final issue, which only this test exercised.

Ugh, I was trying to commit D11382 and committed this as r244120 instead, with an entirely unhelpful commit message :-s
Sorry about doing this, I'll revert it for now, and wait for proper signoff.

jfb mentioned this in rL244121: Revert "Fix MO's analyzePhysReg, it was confusing sub- and super-registers..Aug 5 2015, 1:54 PM

rnk added inline comments.Aug 5 2015, 2:06 PM

lib/Target/X86/X86InstrInfo.cpp
3912–3913 ↗	(On Diff #31273)	I think we can stengthen this to say that using POPF is incorrect, since it can accidentally reset things like TF and IF, and we don't want to do that.
3940 ↗	(On Diff #31273)	I'm concerned that we don't have the right kill states here and below. Unfortunately, I don't know enough to say what we should be doing. :(

Comment on correctness.

lib/Target/X86/X86InstrInfo.cpp
3940 ↗	(On Diff #31273)	Sciencedog here: how would I test this? Push it to do the wrong thing?

lgtm

While I'm concerned that we aren't managing the kill flags right, this seems like a strict improvement of the situation.

This revision is now accepted and ready to land.Aug 10 2015, 1:34 PM

Closed by commit rL244503: x86: Emit LAHF/SAHF instead of PUSHF/POPF (authored by jfb). · Explain WhyAug 10 2015, 2:00 PM

This revision was automatically updated to reflect the committed changes.

There's a follow-up discussion about LAHF/SAHF not being supported by all x86 processors.

sanjoy added a subscriber: sanjoy.Oct 3 2015, 12:51 AM

sanjoy added inline comments.

llvm/trunk/lib/CodeGen/MachineInstrBundle.cpp
313	Was this bit intentional? I think (without understanding the surrounding code) this is contributing to PR25033.

jfb added inline comments.Oct 4 2015, 11:46 AM

llvm/trunk/lib/CodeGen/MachineInstrBundle.cpp
313	Yes, this change was intentional: the code used to get confused about sub/super reg before and my patch would tickle the issues (because it uses AL and AH separately. I believe that the previous code had no clue about subregs, and so other code was simply incorrect but that never manifested because of this bug. I think my fix may tickle other bugs :-) Michael Hordijk tracked down the problem in the mailing list: The IR: CALL64pcrel32 <ga:@foo>, <regmask>, %RSP<imp-use>, %RSP<imp-def>, %EAX<imp-def> %RAX<def> = MOV64rr %R14<kill> So `CALL` defines `EAX`, and we're asking `computeRegisterLiveness` whether or not `RAX` is live. `computeRegisterLiveness` walks backwards and one thing it looks for is whether the register is being defined: if (IsRegOrSuperReg) { PRI.Defines = true; // Reg or a super-register is defined. if (!MO.isDead()) AllDefsDead = false; } And this is what determines if we (`RAX`) is a register or a super-register of the register being defined (`lib/CodeGen/MachineInstrBundle.cpp`): 313: bool IsRegOrSuperReg = MOReg == Reg \|\| TRI->isSubRegister(MOReg, Reg); This will return false, as `RAX` is not a sub register of `EAX`.

sanjoy added inline comments.Oct 4 2015, 1:02 PM

llvm/trunk/lib/CodeGen/MachineInstrBundle.cpp
313	I'm not familiar with the LLVM backend, so here is what I've assumed about "clobber" vs. "define": a register is clobbered if it is partially written to a register is defined if it is fully written to either by a move to that register, or to some super-register that wholly contains the said register If these assumptions are wrong, then you can ignore everything I've said below. :) With these assumptions in place, I read the code in `analyzePhysReg` this way: `IsRegOrSuperReg == true` ==> if `MOReg` is written to (in the containing MI), then `Reg` is defined. Specifically, if `MOReg` is `EAX` and `Reg` is `RAX` then this is `false`, since even if the instruction is defines `EAX`, it does not define `RAX` (though it may still clobber it). `IsRegOrOverlapping == true` ==> if `MOReg` is written to, `Reg` is clobbered (i.e. partially written to). This should be true if `MOReg` is `EAX` and `Reg` is `RAX`. I think the initial problem (before this change, in the IR you showed) was not that `IsRegOrSuperReg` was `false` when it should have been true, but that in `computeRegisterLiveness` we have if (Analysis.Kills \|\| Analysis.Clobbers) // Register killed, so isn't live. return LQR_Dead; I don't think we can return `LQR_Dead` if `Analysis.Clobbers` is true.

maksfb mentioned this in rG85ffa8e4ba44: [PR][BOLT][Instrumentation] Optimize eflags load/store.Jan 11 2022, 1:39 PM

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

MachineInstrBundle.cpp

2 lines

Target/

X86/

X86InstrInfo.cpp

79 lines

test/

CodeGen/

X86/

cmpxchg-clobber-flags.ll

126 lines

Diff 31719

llvm/trunk/lib/CodeGen/MachineInstrBundle.cpp

Show First 20 Lines • Show All 304 Lines • ▼ Show 20 Lines	for (; isValid(); ++*this) {

if (!MO.isReg())		if (!MO.isReg())
continue;		continue;

unsigned MOReg = MO.getReg();		unsigned MOReg = MO.getReg();
if (!MOReg \|\| !TargetRegisterInfo::isPhysicalRegister(MOReg))		if (!MOReg \|\| !TargetRegisterInfo::isPhysicalRegister(MOReg))
continue;		continue;

bool IsRegOrSuperReg = MOReg == Reg \|\| TRI->isSubRegister(MOReg, Reg);		bool IsRegOrSuperReg = MOReg == Reg \|\| TRI->isSuperRegister(MOReg, Reg);
		sanjoyUnsubmitted Not Done Reply Inline Actions Was this bit intentional? I think (without understanding the surrounding code) this is contributing to PR25033. sanjoy: Was this bit intentional? I think (without understanding the surrounding code) this is…
		jfbAuthorUnsubmitted Not Done Reply Inline Actions Yes, this change was intentional: the code used to get confused about sub/super reg before and my patch would tickle the issues (because it uses AL and AH separately. I believe that the previous code had no clue about subregs, and so other code was simply incorrect but that never manifested because of this bug. I think my fix may tickle other bugs :-) Michael Hordijk tracked down the problem in the mailing list: The IR: CALL64pcrel32 <ga:@foo>, <regmask>, %RSP<imp-use>, %RSP<imp-def>, %EAX<imp-def> %RAX<def> = MOV64rr %R14<kill> So `CALL` defines `EAX`, and we're asking `computeRegisterLiveness` whether or not `RAX` is live. `computeRegisterLiveness` walks backwards and one thing it looks for is whether the register is being defined: if (IsRegOrSuperReg) { PRI.Defines = true; // Reg or a super-register is defined. if (!MO.isDead()) AllDefsDead = false; } And this is what determines if we (`RAX`) is a register or a super-register of the register being defined (`lib/CodeGen/MachineInstrBundle.cpp`): 313: bool IsRegOrSuperReg = MOReg == Reg \|\| TRI->isSubRegister(MOReg, Reg); This will return false, as `RAX` is not a sub register of `EAX`. jfb: Yes, this change was intentional: the code used to get confused about sub/super reg before and…
		sanjoyUnsubmitted Not Done Reply Inline Actions I'm not familiar with the LLVM backend, so here is what I've assumed about "clobber" vs. "define": a register is clobbered if it is partially written to a register is defined if it is fully written to either by a move to that register, or to some super-register that wholly contains the said register If these assumptions are wrong, then you can ignore everything I've said below. :) With these assumptions in place, I read the code in `analyzePhysReg` this way: `IsRegOrSuperReg == true` ==> if `MOReg` is written to (in the containing MI), then `Reg` is defined. Specifically, if `MOReg` is `EAX` and `Reg` is `RAX` then this is `false`, since even if the instruction is defines `EAX`, it does not define `RAX` (though it may still clobber it). `IsRegOrOverlapping == true` ==> if `MOReg` is written to, `Reg` is clobbered (i.e. partially written to). This should be true if `MOReg` is `EAX` and `Reg` is `RAX`. I think the initial problem (before this change, in the IR you showed) was not that `IsRegOrSuperReg` was `false` when it should have been true, but that in `computeRegisterLiveness` we have if (Analysis.Kills \|\| Analysis.Clobbers) // Register killed, so isn't live. return LQR_Dead; I don't think we can return `LQR_Dead` if `Analysis.Clobbers` is true. sanjoy: I'm not familiar with the LLVM backend, so here is what I've assumed about "clobber" vs.
bool IsRegOrOverlapping = MOReg == Reg \|\| TRI->regsOverlap(MOReg, Reg);		bool IsRegOrOverlapping = MOReg == Reg \|\| TRI->regsOverlap(MOReg, Reg);

if (IsRegOrSuperReg && MO.readsReg()) {		if (IsRegOrSuperReg && MO.readsReg()) {
// Reg or a super-reg is read, and perhaps killed also.		// Reg or a super-reg is read, and perhaps killed also.
PRI.Reads = true;		PRI.Reads = true;
PRI.Kills = MO.isKill();		PRI.Kills = MO.isKill();
}		}

Show All 21 Lines

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,897 Lines • ▼ Show 20 Lines	if (!Opc)
Opc = CopyToFromAsymmetricReg(DestReg, SrcReg, Subtarget);		Opc = CopyToFromAsymmetricReg(DestReg, SrcReg, Subtarget);

if (Opc) {		if (Opc) {
BuildMI(MBB, MI, DL, get(Opc), DestReg)		BuildMI(MBB, MI, DL, get(Opc), DestReg)
.addReg(SrcReg, getKillRegState(KillSrc));		.addReg(SrcReg, getKillRegState(KillSrc));
return;		return;
}		}

// Moving EFLAGS to / from another register requires a push and a pop.		bool FromEFLAGS = SrcReg == X86::EFLAGS;
		bool ToEFLAGS = DestReg == X86::EFLAGS;
		int Reg = FromEFLAGS ? DestReg : SrcReg;
		bool is32 = X86::GR32RegClass.contains(Reg);
		bool is64 = X86::GR64RegClass.contains(Reg);
		if ((FromEFLAGS \|\| ToEFLAGS) && (is32 \|\| is64)) {
		// The flags need to be saved, but saving EFLAGS with PUSHF/POPF is
		// inefficient. Instead:
		// - Save the overflow flag OF into AL using SETO, and restore it using a
		// signed 8-bit addition of AL and INT8_MAX.
		// - Save/restore the bottom 8 EFLAGS bits (CF, PF, AF, ZF, SF) to/from AH
		// using LAHF/SAHF.
		// - When RAX/EAX is live and isn't the destination register, make sure it
		// isn't clobbered by PUSH/POP'ing it before and after saving/restoring
		// the flags.
		// This approach is ~2.25x faster than using PUSHF/POPF.
		//
		// This is still somewhat inefficient because we don't know which flags are
		// actually live inside EFLAGS. Were we able to do a single SETcc instead of
		// SETO+LAHF / ADDB+SAHF the code could be 1.02x faster.
		//
		// PUSHF/POPF is also potentially incorrect because it affects other flags
		// such as TF/IF/DF, which LLVM doesn't model.
		//
// Notice that we have to adjust the stack if we don't want to clobber the		// Notice that we have to adjust the stack if we don't want to clobber the
// first frame index. See X86FrameLowering.cpp - clobbersTheStack.		// first frame index. See X86FrameLowering.cpp - clobbersTheStack.
if (SrcReg == X86::EFLAGS) {
if (X86::GR64RegClass.contains(DestReg)) {		int Mov = is64 ? X86::MOV64rr : X86::MOV32rr;
BuildMI(MBB, MI, DL, get(X86::PUSHF64));		int Push = is64 ? X86::PUSH64r : X86::PUSH32r;
BuildMI(MBB, MI, DL, get(X86::POP64r), DestReg);		int Pop = is64 ? X86::POP64r : X86::POP32r;
return;		int AX = is64 ? X86::RAX : X86::EAX;
}
if (X86::GR32RegClass.contains(DestReg)) {		bool AXDead = (Reg == AX) \|\|
BuildMI(MBB, MI, DL, get(X86::PUSHF32));		(MachineBasicBlock::LQR_Dead ==
BuildMI(MBB, MI, DL, get(X86::POP32r), DestReg);		MBB.computeRegisterLiveness(&getRegisterInfo(), AX, MI));
return;
}		if (!AXDead)
}		BuildMI(MBB, MI, DL, get(Push)).addReg(AX, getKillRegState(true));
if (DestReg == X86::EFLAGS) {		if (FromEFLAGS) {
if (X86::GR64RegClass.contains(SrcReg)) {		BuildMI(MBB, MI, DL, get(X86::SETOr), X86::AL);
BuildMI(MBB, MI, DL, get(X86::PUSH64r))		BuildMI(MBB, MI, DL, get(X86::LAHF));
.addReg(SrcReg, getKillRegState(KillSrc));		BuildMI(MBB, MI, DL, get(Mov), Reg).addReg(AX);
BuildMI(MBB, MI, DL, get(X86::POPF64));		}
return;		if (ToEFLAGS) {
		BuildMI(MBB, MI, DL, get(Mov), AX).addReg(Reg, getKillRegState(KillSrc));
		BuildMI(MBB, MI, DL, get(X86::ADD8ri), X86::AL)
		.addReg(X86::AL)
		.addImm(INT8_MAX);
		BuildMI(MBB, MI, DL, get(X86::SAHF));
}		}
if (X86::GR32RegClass.contains(SrcReg)) {		if (!AXDead)
BuildMI(MBB, MI, DL, get(X86::PUSH32r))		BuildMI(MBB, MI, DL, get(Pop), AX);
.addReg(SrcReg, getKillRegState(KillSrc));
BuildMI(MBB, MI, DL, get(X86::POPF32));
return;		return;
}		}
}

DEBUG(dbgs() << "Cannot copy " << RI.getName(SrcReg)		DEBUG(dbgs() << "Cannot copy " << RI.getName(SrcReg)
<< " to " << RI.getName(DestReg) << '\n');		<< " to " << RI.getName(DestReg) << '\n');
llvm_unreachable("Cannot emit physreg copy instruction");		llvm_unreachable("Cannot emit physreg copy instruction");
}		}

static unsigned getLoadStoreRegOpcode(unsigned Reg,		static unsigned getLoadStoreRegOpcode(unsigned Reg,
const TargetRegisterClass *RC,		const TargetRegisterClass *RC,
▲ Show 20 Lines • Show All 2,858 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/cmpxchg-clobber-flags.ll

	; RUN: llc -verify-machineinstrs -mtriple=i386-linux-gnu %s -o - \| FileCheck %s			; RUN: llc -verify-machineinstrs -mtriple=i386-linux-gnu %s -o - \| FileCheck %s -check-prefix=i386
	; RUN: llc -verify-machineinstrs -mtriple=i386-linux-gnu -pre-RA-sched=fast %s -o - \| FileCheck %s			; RUN: llc -verify-machineinstrs -mtriple=i386-linux-gnu -pre-RA-sched=fast %s -o - \| FileCheck %s -check-prefix=i386f
	; RUN: llc -verify-machineinstrs -mtriple=x86_64-linux-gnu %s -o - \| FileCheck %s			; RUN: llc -verify-machineinstrs -mtriple=x86_64-linux-gnu %s -o - \| FileCheck %s -check-prefix=x8664
	; RUN: llc -verify-machineinstrs -mtriple=x86_64-linux-gnu -pre-RA-sched=fast %s -o - \| FileCheck %s			; RUN: llc -verify-machineinstrs -mtriple=x86_64-linux-gnu -pre-RA-sched=fast %s -o - \| FileCheck %s -check-prefix=x8664

	declare i32 @bar()			declare i32 @foo()
				declare i32 @bar(i64)

	define i64 @test_intervening_call(i64* %foo, i64 %bar, i64 %baz) {			define i64 @test_intervening_call(i64* %foo, i64 %bar, i64 %baz) {
	; CHECK-LABEL: test_intervening_call:			; i386-LABEL: test_intervening_call:
	; CHECK: cmpxchg			; i386: cmpxchg8b
	; CHECK: pushf[[LQ:[lq]]]			; i386-NEXT: pushl %eax
	; CHECK-NEXT: pop[[LQ]] [[FLAGS:%.*]]			; i386-NEXT: seto %al
				; i386-NEXT: lahf
	; CHECK-NEXT: call[[LQ]] bar			; i386-NEXT: movl %eax, [[FLAGS:%.*]]
				; i386-NEXT: popl %eax
	; CHECK-NEXT: push[[LQ]] [[FLAGS]]			; i386-NEXT: movl %edx, 4(%esp)
	; CHECK-NEXT: popf[[LQ]]			; i386-NEXT: movl %eax, (%esp)
	; CHECK-NEXT: jne			; i386-NEXT: calll bar
				; i386-NEXT: movl [[FLAGS]], %eax
				; i386-NEXT: addb $127, %al
				; i386-NEXT: sahf
				; i386-NEXT: jne

				; i386f-LABEL: test_intervening_call:
				; i386f: cmpxchg8b
				; i386f-NEXT: movl %eax, (%esp)
				; i386f-NEXT: movl %edx, 4(%esp)
				; i386f-NEXT: seto %al
				; i386f-NEXT: lahf
				; i386f-NEXT: movl %eax, [[FLAGS:%.*]]
				; i386f-NEXT: calll bar
				; i386f-NEXT: movl [[FLAGS]], %eax
				; i386f-NEXT: addb $127, %al
				; i386f-NEXT: sahf
				; i386f-NEXT: jne

				; x8664-LABEL: test_intervening_call:
				; x8664: cmpxchgq
				; x8664: pushq %rax
				; x8664-NEXT: seto %al
				; x8664-NEXT: lahf
				; x8664-NEXT: movq %rax, [[FLAGS:%.*]]
				; x8664-NEXT: popq %rax
				; x8664-NEXT: movq %rax, %rdi
				; x8664-NEXT: callq bar
				; x8664-NEXT: movq [[FLAGS]], %rax
				; x8664-NEXT: addb $127, %al
				; x8664-NEXT: sahf
				; x8664-NEXT: jne

	%cx = cmpxchg i64* %foo, i64 %bar, i64 %baz seq_cst seq_cst			%cx = cmpxchg i64* %foo, i64 %bar, i64 %baz seq_cst seq_cst
				%v = extractvalue { i64, i1 } %cx, 0
	%p = extractvalue { i64, i1 } %cx, 1			%p = extractvalue { i64, i1 } %cx, 1
	call i32 @bar()			call i32 @bar(i64 %v)
	br i1 %p, label %t, label %f			br i1 %p, label %t, label %f

	t:			t:
	ret i64 42			ret i64 42

	f:			f:
	ret i64 0			ret i64 0
	}			}

	; Interesting in producing a clobber without any function calls.			; Interesting in producing a clobber without any function calls.
	define i32 @test_control_flow(i32* %p, i32 %i, i32 %j) {			define i32 @test_control_flow(i32* %p, i32 %i, i32 %j) {
	; CHECK-LABEL: test_control_flow:			; i386-LABEL: test_control_flow:
				; i386: cmpxchg
				; i386-NEXT: jne

				; i386f-LABEL: test_control_flow:
				; i386f: cmpxchg
				; i386f-NEXT: jne

				; x8664-LABEL: test_control_flow:
				; x8664: cmpxchg
				; x8664-NEXT: jne

	; CHECK: cmpxchg
	; CHECK-NEXT: jne
	entry:			entry:
	%cmp = icmp sgt i32 %i, %j			%cmp = icmp sgt i32 %i, %j
	br i1 %cmp, label %loop_start, label %cond.end			br i1 %cmp, label %loop_start, label %cond.end

	loop_start:			loop_start:
	br label %while.condthread-pre-split.i			br label %while.condthread-pre-split.i

	while.condthread-pre-split.i:			while.condthread-pre-split.i:
	Show All 17 Lines
	cond.end:			cond.end:
	%cond = phi i32 [ %i, %entry ], [ 0, %cond.end.loopexit ]			%cond = phi i32 [ %i, %entry ], [ 0, %cond.end.loopexit ]
	ret i32 %cond			ret i32 %cond
	}			}

	; This one is an interesting case because CMOV doesn't have a chain			; This one is an interesting case because CMOV doesn't have a chain
	; operand. Naive attempts to limit cmpxchg EFLAGS use are likely to fail here.			; operand. Naive attempts to limit cmpxchg EFLAGS use are likely to fail here.
	define i32 @test_feed_cmov(i32* %addr, i32 %desired, i32 %new) {			define i32 @test_feed_cmov(i32* %addr, i32 %desired, i32 %new) {
	; CHECK-LABEL: test_feed_cmov:			; i386-LABEL: test_feed_cmov:
				; i386: cmpxchgl
	; CHECK: cmpxchg			; i386-NEXT: seto %al
	; CHECK: pushf[[LQ:[lq]]]			; i386-NEXT: lahf
	; CHECK-NEXT: pop[[LQ]] [[FLAGS:%.*]]			; i386-NEXT: movl %eax, [[FLAGS:%.*]]
				; i386-NEXT: calll foo
	; CHECK-NEXT: call[[LQ]] bar			; i386-NEXT: pushl %eax
				; i386-NEXT: movl [[FLAGS]], %eax
				; i386-NEXT: addb $127, %al
				; i386-NEXT: sahf
				; i386-NEXT: popl %eax

				; i386f-LABEL: test_feed_cmov:
				; i386f: cmpxchgl
				; i386f-NEXT: seto %al
				; i386f-NEXT: lahf
				; i386f-NEXT: movl %eax, [[FLAGS:%.*]]
				; i386f-NEXT: calll foo
				; i386f-NEXT: pushl %eax
				; i386f-NEXT: movl [[FLAGS]], %eax
				; i386f-NEXT: addb $127, %al
				; i386f-NEXT: sahf
				; i386f-NEXT: popl %eax

				; x8664-LABEL: test_feed_cmov:
				; x8664: cmpxchgl
				; x8664: seto %al
				; x8664-NEXT: lahf
				; x8664-NEXT: movq %rax, [[FLAGS:%.*]]
				; x8664-NEXT: callq foo
				; x8664-NEXT: pushq %rax
				; x8664-NEXT: movq [[FLAGS]], %rax
				; x8664-NEXT: addb $127, %al
				; x8664-NEXT: sahf
				; x8664-NEXT: popq %rax

	; CHECK-NEXT: push[[LQ]] [[FLAGS]]
	; CHECK-NEXT: popf[[LQ]]
	%res = cmpxchg i32* %addr, i32 %desired, i32 %new seq_cst seq_cst			%res = cmpxchg i32* %addr, i32 %desired, i32 %new seq_cst seq_cst
	%success = extractvalue { i32, i1 } %res, 1			%success = extractvalue { i32, i1 } %res, 1

	%rhs = call i32 @bar()			%rhs = call i32 @foo()

	%ret = select i1 %success, i32 %new, i32 %rhs			%ret = select i1 %success, i32 %new, i32 %rhs
	ret i32 %ret			ret i32 %ret
	}			}