This is an archive of the discontinued LLVM Phabricator instance.

[X86] Don't zero out %eax if both %al and %ah are used
ClosedPublic

Authored by void on Dec 8 2022, 3:31 PM.

Details

Summary

The iterator over super and sub registers doesn't include both 8-bit
registers in its list. So if both registers are used and only one of
them is live on return, then we need to make sure that the other 8-bit
register is also marked as live and not zeroed out.

Diff Detail

Event Timeline

void created this revision.Dec 8 2022, 3:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 8 2022, 3:31 PM
void requested review of this revision.Dec 8 2022, 3:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 8 2022, 3:31 PM
arsenm added inline comments.Dec 8 2022, 3:59 PM
llvm/lib/Target/X86/X86FrameLowering.cpp
499–501 ↗(On Diff #481464)

Why is this doing so much work? Why can't this use LivePhysRegUnits and addLiveOuts?

535 ↗(On Diff #481464)

Not checking for a def?

nickdesaulniers removed a subscriber: pengfei.

The iterator over super and sub registers doesn't include both 8-bit registers in its list.

This is the use of sub_and_superregs_inclusive in the caller PEI::insertZeroCallUsedRegs() that you're referring to? Does that need to be fixed? Is there another iterator that gives us both? Should there be? Maybe @pengfei , @RKSimon , or @craig.topper knows?

The generated TargetRegisterClass for GR8RegClass and GRH8RegClass BOTH have set CoveredBySubRegs to false.

llvm/lib/Target/X86/X86FrameLowering.cpp
527 ↗(On Diff #481464)

Is live out of the function itself? Or between basic blocks?

MachineRegisterInfo can answer whether a given register is live in MachineRegisterInfo::isLiveIn. I assume that's the same side of the coin for "is a given register live out of my predecessor?" Perhaps that's better to query than looking into every operand of every instruction in the function?

MachineBasicBlock::liveouts() also gives you an iterator to a RegisterPairMask. Might be useful here?

540–541 ↗(On Diff #481464)

Is it possible to never set the bits of RegsToZero in PEI::insertZeroCallUsedRegs() for these in the first place?

llvm/test/CodeGen/X86/zero-call-used-regs-i386.ll
21

add nounwind to to the attribute list of @test1 to remove these obnoxious .cfi directives which are just noise.

Though, I do also wonder if a MIR test (-stop-after=prologepilog) would be less brittle/dependent on %al being allocated. Then we could have 1 function per register and know precisely what happens for which registers.

void added inline comments.Dec 9 2022, 12:03 PM
llvm/lib/Target/X86/X86FrameLowering.cpp
499–501 ↗(On Diff #481464)

Is there an example of how to use them?

527 ↗(On Diff #481464)

MachineBasicBlock::liveouts() doesn't appear to work here.

535 ↗(On Diff #481464)

No. We're only checking if it's live on the return instruction. I won't be counted as a def.

540–541 ↗(On Diff #481464)

There doesn't seem to be an easy mechanism for doing that. The iterator doesn't appear to go over both the high and low registers. I'm not sure if that's a bug or by design.

void added a comment.Dec 9 2022, 12:05 PM

The iterator over super and sub registers doesn't include both 8-bit registers in its list.

This is the use of sub_and_superregs_inclusive in the caller PEI::insertZeroCallUsedRegs() that you're referring to? Does that need to be fixed? Is there another iterator that gives us both? Should there be? Maybe @pengfei , @RKSimon , or @craig.topper knows?

The generated TargetRegisterClass for GR8RegClass and GRH8RegClass BOTH have set CoveredBySubRegs to false.

That might be by design, if by sub register they mean something like %al is a sub register of %ax, because they "overlap."

nickdesaulniers accepted this revision.Dec 9 2022, 2:35 PM

I think x86 is an oddity in that TargetRegisterInfo::sub_and_superregs_inclusive doesn't return sub_8bit or sub_8bit_hi. I suspect this is because in llvm/lib/Target/X86/X86RegisterInfo.td they're not marked as CoveredBySubRegs. This patch isn't doing that much work; it's looking at each MachineBasicBlock terminator operands if the terminate is a return instruction only.

The offending MachineInstr in this case is RET 0, $al.

I spent some time playing with MCRegisterDesc and TargetRegisterInfo::getNumSubRegIndices (and sub register indices in general) and couldn't quite get what I wanted. getX86SubSuperRegister looks like a hack/lacks symmetry/works around CoveredBySubRegs and I can't help but wonder if that's a bug, or if marking sub_8bit or sub_8bit_hi as CoveredBySubRegs would break other things.

Please update the comment and add nounwind to the test.

llvm/lib/Target/X86/X86FrameLowering.cpp
527 ↗(On Diff #481464)

Please update the comment to s/live out/live out of the function's return instructions/

This revision is now accepted and ready to land.Dec 9 2022, 2:35 PM

The iterator over super and sub registers doesn't include both 8-bit registers in its list.

This is the use of sub_and_superregs_inclusive in the caller PEI::insertZeroCallUsedRegs() that you're referring to? Does that need to be fixed? Is there another iterator that gives us both? Should there be? Maybe @pengfei , @RKSimon , or @craig.topper knows?

The generated TargetRegisterClass for GR8RegClass and GRH8RegClass BOTH have set CoveredBySubRegs to false.

I didn't know sub_and_superregs_inclusive. But I think I figured it out with a small code https://godbolt.org/z/9q8j66sYr
Unfortunately I cannot figure out how to succefully link it in CE. Here is the command I used locally clang++ Regs.cpp -I llvm/include/ -std=c++17 -Ibuilds/include -lLLVMSupport -lLLVMCore -lLLVMMC -lLLVMMCDisassembler -lLLVMTarget -lLLVMX86Desc -lLLVMX86Disassembler -lLLVMX86Info -lLLVMBinaryFormat -Lbuilds/lib -fuse-ld=lld -lpthread -lz -lzstd -fno-rtti -ltinfo.
As you can see, when pass AH (2) to it, it returns AH, AX, EAX and RAX. But if we pass AX (3) to it, it returns AL, AH, AX, EAX and RAX. I think the API works as expected because AL is not sub of AH but sibling.

void added a comment.Dec 12 2022, 11:10 AM

The iterator over super and sub registers doesn't include both 8-bit registers in its list.

This is the use of sub_and_superregs_inclusive in the caller PEI::insertZeroCallUsedRegs() that you're referring to? Does that need to be fixed? Is there another iterator that gives us both? Should there be? Maybe @pengfei , @RKSimon , or @craig.topper knows?

The generated TargetRegisterClass for GR8RegClass and GRH8RegClass BOTH have set CoveredBySubRegs to false.

I didn't know sub_and_superregs_inclusive. But I think I figured it out with a small code https://godbolt.org/z/9q8j66sYr
Unfortunately I cannot figure out how to succefully link it in CE. Here is the command I used locally clang++ Regs.cpp -I llvm/include/ -std=c++17 -Ibuilds/include -lLLVMSupport -lLLVMCore -lLLVMMC -lLLVMMCDisassembler -lLLVMTarget -lLLVMX86Desc -lLLVMX86Disassembler -lLLVMX86Info -lLLVMBinaryFormat -Lbuilds/lib -fuse-ld=lld -lpthread -lz -lzstd -fno-rtti -ltinfo.
As you can see, when pass AH (2) to it, it returns AH, AX, EAX and RAX. But if we pass AX (3) to it, it returns AL, AH, AX, EAX and RAX. I think the API works as expected because AL is not sub of AH but sibling.

Thanks, @pengfei, that confirms what I suspected. Maybe there should be an "and siblings" iterator as well.

void updated this revision to Diff 482203.Dec 12 2022, 11:28 AM
void marked an inline comment as done.

Add "nounwind" to testcase and update comment.

void updated this revision to Diff 482299.Dec 12 2022, 3:36 PM

Reformat

nickdesaulniers accepted this revision.Dec 12 2022, 3:38 PM
arsenm added inline comments.Dec 12 2022, 6:59 PM
llvm/lib/Target/X86/X86FrameLowering.cpp
499–501 ↗(On Diff #481464)

Simplest looking one looks like DeadMachineInstructionElim. Just addLiveOuts and stepBackward to the point you want. Then check what's live

arsenm added inline comments.Dec 12 2022, 7:01 PM
llvm/lib/Target/X86/X86FrameLowering.cpp
530–531 ↗(On Diff #482299)

I think you would be better off operating in terms of regunits and then going backwards to registers.

void planned changes to this revision.Dec 13 2022, 12:12 PM
void added inline comments.
llvm/lib/Target/X86/X86FrameLowering.cpp
530–531 ↗(On Diff #482299)

Akchewally, it looks like using RegUnits in the prologue-epilogue inserter gives us what we want. I'll change this.

void updated this revision to Diff 482616.Dec 13 2022, 1:26 PM

Use RegUnit to get any sibling registers.

This revision is now accepted and ready to land.Dec 13 2022, 1:26 PM
void requested review of this revision.Dec 13 2022, 1:27 PM

Okay. This should be a better version of the patch. PTAL.

nickdesaulniers accepted this revision.Dec 13 2022, 2:16 PM
This revision is now accepted and ready to land.Dec 13 2022, 2:16 PM
RKSimon closed this revision.Dec 14 2022, 10:47 AM