Page MenuHomePhabricator

Fix for Bug 28144
ClosedPublic

Authored by vivekvpandya on Jun 15 2016, 12:15 PM.

Details

Summary

This patch contains fix for the bug https://llvm.org/bugs/show_bug.cgi?id=28144 related to register usage calculation, previously it was not considering callee RegMasks at callsites in the Machinefunction thus generating wrong register usage information.

Diff Detail

Event Timeline

vivekvpandya retitled this revision from to Fix for Bug 28144 .
vivekvpandya updated this object.
vivekvpandya added inline comments.Jun 15 2016, 12:17 PM
lib/CodeGen/RegUsageInfoCollector.cpp
117

I have added this code after marking callee saved registers as preserved because if any callee inside the MF body is using the register , which has marked safe above then that will be correct register usage information.

vivekvpandya retitled this revision from Fix for Bug 28144 to Fix for Bug 28144 and print register preserved in comments in asm files.Jun 15 2016, 12:18 PM
mcrosier added inline comments.Jun 15 2016, 12:45 PM
lib/CodeGen/RegUsageInfoCollector.cpp
122
if (!MO.isRegMask())
  continue;
124

Please rename index to i and pre-increment i.

for (unsigned i = 0; i < regMaskSize; ++i)

Also, did you run the lit tests before submitting? I think this may break one of your original tests.

mcrosier added inline comments.Jun 15 2016, 2:13 PM
lib/CodeGen/AsmPrinter/AsmPrinter.cpp
671 ↗(On Diff #60874)

No need for extra curly brackets.

mehdi_amini added inline comments.Jun 15 2016, 2:13 PM
lib/CodeGen/RegUsageInfoCollector.cpp
117

Can you take an example? I don't make sense of it.

mcrosier edited edge metadata.Jun 15 2016, 2:21 PM

Please rebase this patch after r272838.

@mcrosier thanks for pointing test case failure as now RegMask calculation changed test case needs to be updated. Actually lnt test-suite passed all test so I just forget to run lit tests.

vivekvpandya marked 2 inline comments as done.Jun 15 2016, 7:42 PM
vivekvpandya added inline comments.
lib/CodeGen/RegUsageInfoCollector.cpp
117

@mehdi_amini This is not real example but this is what I have thought of when writing this code. Consider foo() sets R15 as per the calling convention as preserved register but one of its callee lat say bar() actually clobbers R15 then foo() will mark R15 as clobbered. I think you may have better reason.

mehdi_amini added inline comments.Jun 15 2016, 9:12 PM
lib/CodeGen/RegUsageInfoCollector.cpp
117

Isn't your example showing exactly why it is not the right ordering?

vivekvpandya added inline comments.Jun 15 2016, 9:22 PM
lib/CodeGen/RegUsageInfoCollector.cpp
117

@mehdi_amini Do you mean that if foo()'s CC preserves R15 , it would be saved by foo() so no problem if foo's caller is using R15 ?

vivekvpandya edited edge metadata.

updates according to suggestions and test case fix.

mehdi_amini requested changes to this revision.Jun 15 2016, 10:46 PM
mehdi_amini edited edge metadata.
This revision now requires changes to proceed.Jun 15 2016, 10:46 PM
mehdi_amini added inline comments.Jun 15 2016, 10:46 PM
test/CodeGen/X86/ipra-reg-usage.ll
6

foo has preserve_allcc but it is clobbering a lot of registers? Is it what you expect?

vivekvpandya edited edge metadata.

The MachineFunction's CC is used to optimize number of clobbered register after callee RegMask has considered for calculation. The correctness of this order is tested with test-suite and nothing breaks due to this.

vivekvpandya marked an inline comment as done.Jun 16 2016, 9:13 AM

I hope this patch is good now.

I'm going to defer to Mehdi or another mentor for final approval.

MatzeB added inline comments.Jun 17 2016, 11:46 AM
lib/CodeGen/AsmPrinter/AsmPrinter.cpp
659–673 ↗(On Diff #60985)

This part seems independent to me and could go into a separate commit.

lib/CodeGen/RegUsageInfoCollector.cpp
113–126

Note that the register allocator already computes this information. You can simply use something like MachineRegisterInfo::getUsedPhysRegsMask() to get it.

vivekvpandya added inline comments.Jun 18 2016, 9:36 AM
lib/CodeGen/RegUsageInfoCollector.cpp
113–126

@MatzeB I tried out MachineRegisterInfo::getUsedPhysRegsMask() instead of this loops, that produces more strict regmask due to that no additional failures have been noted in test-suites but around 300 test cases have performance regression also consider following test case:

; RUN: llc -enable-ipra -print-regusage -o /dev/null 2>&1 < %s | FileCheck %s

target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.12.0"

; Verify that bar does not clobber anything
; CHECK-NOT: bar Clobbered Registers:{{.+}}
; CHECK: bar Clobbered Registers:
define void @bar() #0 {
  ret void
}

; Verifies that inline assembly is correctly handled by giving a list of clobbered registers
; CHECK: foo Clobbered Registers: AH AL AX CH CL CX DI DIL EAX ECX EDI RAX RCX RDI
define void @foo() #0 {
  call void asm sideeffect "", "~{eax},~{ecx},~{edi}"() #0
  ret void
}

attributes #0 = { nounwind }

now function foo clobbers, AH AL AX CH CL CS CX DH DI DIL DL DS DX EAX ECX EDI EDX EFLAGS EIP EIZ ES ESI ESP FPSW FS GS IP RAX RCX RDI RDX RIP RIZ RSI RSP SI SIL SP SPL SS BND0 BND1 BND2 BND3 CR0 CR1 CR2 CR3 CR4 CR5 CR6 CR7 CR8 CR9 CR10 CR11 CR12 CR13 CR14 CR15 DR0 DR1 DR2 DR3 DR4 DR5 DR6 DR7 DR8 DR9 DR10 DR11 DR12 DR13 DR14 DR15 FP0 FP1 FP2 FP3 FP4 FP5 FP6 FP7 K0 K1 K2 K3 K4 K5 K6 K7 MM0 MM1 MM2 MM3 MM4 MM5 MM6 MM7 R8 R9 R10 R11 ST0 ST1 ST2 ST3 ST4 ST5 ST6 ST7 XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 XMM9 XMM10 XMM11 XMM12 XMM13 XMM14 XMM15 XMM16 XMM17 XMM18 XMM19 XMM20 XMM21 XMM22 XMM23 XMM24 XMM25 XMM26 XMM27 XMM28 XMM29 XMM30 XMM31 YMM0 YMM1 YMM2 YMM3 YMM4 YMM5 YMM6 YMM7 YMM8 YMM9 YMM10 YMM11 YMM12 YMM13 YMM14 YMM15 YMM16 YMM17 YMM18 YMM19 YMM20 YMM21 YMM22 YMM23 YMM24 YMM25 YMM26 YMM27 YMM28 YMM29 YMM30 YMM31 ZMM0 ZMM1 ZMM2 ZMM3 ZMM4 ZMM5 ZMM6 ZMM7 ZMM8 ZMM9 ZMM10 ZMM11 ZMM12 ZMM13 ZMM14 ZMM15 ZMM16 ZMM17 ZMM18 ZMM19 ZMM20 ZMM21 ZMM22 ZMM23 ZMM24 ZMM25 ZMM26 ZMM27 ZMM28 ZMM29 ZMM30 ZMM31 R8B R9B R10B R11B R8D R9D R10D R11D R8W R9W R10W R11W

foo does not have any function calls so I don't understand this result.

vivekvpandya retitled this revision from Fix for Bug 28144 and print register preserved in comments in asm files to Fix for Bug 28144.
vivekvpandya updated this object.
vivekvpandya edited edge metadata.
mehdi_amini added inline comments.Jun 18 2016, 10:36 AM
lib/CodeGen/RegUsageInfoCollector.cpp
113–126

Matze: I believe the register allocator does not account for register clobbered by callees.

test/CodeGen/X86/ipra-reg-usage.ll
6

This comment is marked as done, but I haven't seen any answer and the code hasn't been updated?

vivekvpandya added inline comments.Jun 18 2016, 10:46 AM
test/CodeGen/X86/ipra-reg-usage.ll
6

The callee saved registers as per CC has been added after function calls clobbered regmask has been considered. The clobbered registers are less compare to previous order.

vivekvpandya added inline comments.Jun 18 2016, 11:53 AM
test/CodeGen/X86/ipra-reg-usage.ll
6

This is correct detail , and here R11 is marked clobbered as preserve_allcc does not preserve R11, it can be verified as follow:
preserve_all CC returns CSR_64_RT_AllRegs_SaveList from getCalleeSavedRegs() and CSR_64_RT_AllRegs_SaveList is defined as follow in X86CallingConv.td

// All GPRs - except r11
def CSR_64_RT_MostRegs : CalleeSavedRegs<(add CSR_64, RAX, RCX, RDX, RSI, RDI,
                                              R8, R9, R10, RSP)>;

// All registers - except r11
def CSR_64_RT_AllRegs     : CalleeSavedRegs<(add CSR_64_RT_MostRegs,
                                                 (sequence "XMM%u", 0, 15))>;
vivekvpandya marked an inline comment as not done.Jun 18 2016, 11:55 AM
mehdi_amini added inline comments.Jun 18 2016, 10:46 PM
test/CodeGen/X86/ipra-reg-usage.ll
6

Why do we clobber XMM* YMM* and ZMM*? Is it also expected?

vivekvpandya added inline comments.Jun 18 2016, 10:54 PM
test/CodeGen/X86/ipra-reg-usage.ll
6

YMM* and ZMM* are clobbered due to reg mask of bar1 and bar2 has them clobbered.

mehdi_amini added inline comments.Jun 18 2016, 10:57 PM
test/CodeGen/X86/ipra-reg-usage.ll
6

My question is more about the "preserve_allcc".

vivekvpandya added inline comments.Jun 18 2016, 11:00 PM
test/CodeGen/X86/ipra-reg-usage.ll
6

as you can see CSR_64_RT_MostRegs do not add YMM* or ZMM* so preserve_allcc does not preserve them.

vivekvpandya added inline comments.Jun 18 2016, 11:46 PM
test/CodeGen/X86/ipra-reg-usage.ll
6

again preserve_allcc preserves XMM0 to XMM15 that can be seen from CSR_64_RT_AllRegs but it does not preserve XMM16 onwards

MatzeB added inline comments.Jun 20 2016, 10:27 AM
lib/CodeGen/RegUsageInfoCollector.cpp
113–126

Why do you say so? The registers clobbered by the callees is exactlye what the UsedPhysRegMask in MachineRegisterInfo is about, see also the use of addPhysRegsUsedFromRegMask() in VirtRegMap.cpp/FastRegAlloc.cpp.

vivekvpandya added inline comments.Jun 20 2016, 9:32 PM
lib/CodeGen/RegUsageInfoCollector.cpp
113–126
// Code snippet 1
  for (MachineBasicBlock &MBB : MF) {
    for (MachineInstr &MI : MBB) {
      if (!MI.isCall())
        continue;
      for (MachineOperand &MO : MI.operands()) {
        if (!MO.isRegMask())
          continue;
        const uint32_t *CalleeRegMask = MO.getRegMask();
        //DEBUG(dbgs() << "Callee RegMask : \n");
        //printRegMask(TRI, CalleeRegMask);
        for (unsigned i = 0; i < RegMaskSize; ++i)
          RegMask[i] = RegMask[i] & CalleeRegMask[i];
        break;
      }
    }
  }
// Code snippet 2
BitVector UsedPhysRegMask = MRI->getUsedPhysRegsMask();
  for (unsigned i = 0; i < RegMaskSize; ++i)
           RegMask[i] = RegMask[i] & UsedPhysRegMask[i];

@MatzeB consider following simple test case for comparing Code snippet 1 (CS1)
over Code snippet 2 (CS2)

target triple = "x86_64-unknown-unknown"
define void @bar1() {
	ret void
}
define preserve_allcc void @foo()#0 {
	call void @bar1()
	call void @bar2()
	ret void
}
define void @bar2() {
	ret void
}
attributes #0 = {nounwind}

with CS1 for this test case bar1 and bar2 does not clobbered any register how ever
when I use CS2 it bar1 and bar2 clobberes following registers:
AH AL AX CH CL CS CX DH DI DIL DL DS DX EAX ECX EDI EDX EFLAGS EIP EIZ ES ESI ESP FPSW FS GS IP RAX RCX RDI RDX RIP RIZ RSI RSP SI SIL SP SPL SS BND0 BND1 BND2 BND3 CR0 CR1 CR2 CR3 CR4 CR5 CR6 CR7 CR8 CR9 CR10 CR11 CR12 CR13 CR14 CR15 DR0 DR1 DR2 DR3 DR4 DR5 DR6 DR7 DR8 DR9 DR10 DR11 DR12 DR13 DR14 DR15 FP0 FP1 FP2 FP3 FP4 FP5 FP6 FP7 K0 K1 K2 K3 K4 K5 K6 K7 MM0 MM1 MM2 MM3 MM4 MM5 MM6 MM7 R8 R9 R10 R11 ST0 ST1 ST2 ST3 ST4 ST5 ST6 ST7 XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 XMM9 XMM10 XMM11 XMM12 XMM13 XMM14 XMM15 XMM16 XMM17 XMM18 XMM19 XMM20 XMM21 XMM22 XMM23 XMM24 XMM25 XMM26 XMM27 XMM28 XMM29 XMM30 XMM31 YMM0 YMM1 YMM2 YMM3 YMM4 YMM5 YMM6 YMM7 YMM8 YMM9 YMM10 YMM11 YMM12 YMM13 YMM14 YMM15 YMM16 YMM17 YMM18 YMM19 YMM20 YMM21 YMM22 YMM23 YMM24 YMM25 YMM26 YMM27 YMM28 YMM29 YMM30 YMM31 ZMM0 ZMM1 ZMM2 ZMM3 ZMM4 ZMM5 ZMM6 ZMM7 ZMM8 ZMM9 ZMM10 ZMM11 ZMM12 ZMM13 ZMM14 ZMM15 ZMM16 ZMM17 ZMM18 ZMM19 ZMM20 ZMM21 ZMM22 ZMM23 ZMM24 ZMM25 ZMM26 ZMM27 ZMM28 ZMM29 ZMM30 ZMM31 R8B R9B R10B R11B R8D R9D R10D R11D R8W R9W R10W R11W

now as with CS1 we are getting correct code for test-suite I don't think we should
use more strict regmask and I also don't understand why with CS2 I am getting more clobbered registers
if getUsedPhysRegsMask() is retruning used regs due to calls in given MF.

I think now this is in good shape. I don't have commit access so if this is ok now then please commit it.

mehdi_amini added inline comments.Jun 21 2016, 8:57 PM
lib/CodeGen/RegUsageInfoCollector.cpp
115

Add a comment before this hunk.

test/CodeGen/X86/ipra-reg-usage.ll
6

The list sounds more reasonable now. I think you should have sanitized one-by-one in the first place.

vivekvpandya added inline comments.Jun 21 2016, 9:01 PM
test/CodeGen/X86/ipra-reg-usage.ll
6

I am really confused with this result, because locally I added a print method to see which regs are clobbered in mask and I verified the output. But it seems I need to dig more about this this indicates perhaps with 2 loop approach I have missed something

mehdi_amini added inline comments.Jun 21 2016, 9:05 PM
test/CodeGen/X86/ipra-reg-usage.ll
6

If you're confused with this result, you shouldn't ask us to commit this patch, but should first make sure we all have a good understanding of what's going on here.
Unfortunately I won't have much time to dig in here.

vivekvpandya added inline comments.Jun 21 2016, 9:08 PM
test/CodeGen/X86/ipra-reg-usage.ll
6

I am looking into it.

@mehdi_amini ,@MatzeB
Here is my analysis:

BitVector UsedPhysRegMask = MRI->getUsedPhysRegsMask();
  for (unsigned i = 0; i < RegMaskSize; ++i)
    RegMask[i] &= ~UsedPhysRegMask[i];

The above code is buggy because UsedPhysRegMask is indexed by register no where
as RegMask is vector integers in which each bit represenet status of one register.
I am sorry for that, I assumed that in LLVM all regmask are following same representation.

// Code snippet 1 (CS1)
  for (unsigned PReg = 1, PRegE = TRI->getNumRegs(); PReg < PRegE; ++PReg)
    if (!MRI->reg_nodbg_empty(PReg) && MRI->isPhysRegUsed(PReg))
      markRegClobbered(TRI, &RegMask[0], PReg);
// Code snippet 2 (CS2)
  for (unsigned PReg = 1, PRegE = TRI->getNumRegs(); PReg < PRegE; ++PReg)
    if (MRI->isPhysRegUsed(PReg))
      markRegClobbered(TRI, &RegMask[0], PReg);

Now actaully CS2 is sufficient and correct code for calculating regmask. And CS1
is having a very subtle bug which caused incorrect regmask (such as not marking
R11 as clobbered). Now why CS2 is correct and sufficient (I think @MatzeB has
pointed this fact earlier) it is because MRI::isPhysRegUsed actaully uses
UsedPhysRegMask so no need to consider that details again and MRI::isPhysRegUsed
also uses reg_nodbg_empty.

Now what is the problem with CS1, due to !MRI->reg_nodbg_empty(PReg) condition
fails when MF is not using PReg and MRI->isPhysRegUsed(PReg) would never execute.
thus generating wrong regmask. So why it happens so. Consider following definition
of reg_nodbg_empty and reg_nodbg_begin, it uses getRegUseDefListHead and if given
register is not used in side a MF then it will return true (I think it should not
Isn't reg_nodbg_empty return false if Reg is not used in MF at all?)
and due to that !MRI->reg_nodbg_empty(PReg) will be false.

/// reg_nodbg_empty - Return true if the only instructions using or defining
/// Reg are Debug instructions.
  bool reg_nodbg_empty(unsigned RegNo) const {
     return reg_nodbg_begin(RegNo) == reg_nodbg_end();
  }

  reg_nodbg_iterator reg_nodbg_begin(unsigned RegNo) const {
     return reg_nodbg_iterator(getRegUseDefListHead(RegNo));
   }

What are your thoughts? Have I missed any thing?

@mehdi_amini @MatzeB test-suite shows some new runtime failures, these failures are different compare to previous time i.e test-suite/SingleSource/Benchmarks/Stanford/Towers.c compiles and executes fine but SingleSource/Benchmarks/Misc/fbench and some other fails.

@mehdi_amini @MatzeB the failures mentioned in above comment is due to local function related changes (setting regmask to none) I am looking more into this. @MatzeB please share your views on this.

Changes as per discussion over email.

mehdi_amini added inline comments.Jun 28 2016, 8:02 AM
include/llvm/CodeGen/MachineRegisterInfo.h
712 ↗(On Diff #61593)

Document the new parameter

lib/CodeGen/MachineRegisterInfo.cpp
512 ↗(On Diff #61593)

Merge the two if with &&

ping! Also as I don't have commit access, who ever is going to commit this in future please commit this change before committing http://reviews.llvm.org/D21561

You can mark the dependency in phabricator explicitly.

mehdi_amini added inline comments.Jul 6 2016, 1:01 PM
lib/CodeGen/MachineRegisterInfo.cpp
511 ↗(On Diff #62105)

Do you have a test for that?
Are you fixing two different bugs here?

vivekvpandya added inline comments.Jul 6 2016, 1:06 PM
lib/CodeGen/MachineRegisterInfo.cpp
511 ↗(On Diff #62105)

No this not fix for local function related bug, but this is for code reusability. As per discussion with MtazeB and Hal method to calculate register usage is same except isNoReturnDef is not required. So we decide to add a parameter with optional value. I don't think we require test case for this.

LGTM.
@MatzeB: anything else?

MatzeB accepted this revision.Jul 8 2016, 11:38 AM
MatzeB edited edge metadata.

LGTM

I don't have commit access, please some one commit this if it does not require rebase.

Committed r275087.

Thanks @mcrosier for committing this and also updating the bug report.

mehdi_amini accepted this revision.Jul 12 2016, 4:10 PM
mehdi_amini edited edge metadata.
This revision is now accepted and ready to land.Jul 12 2016, 4:10 PM
mehdi_amini closed this revision.Jul 12 2016, 4:10 PM

(Closing since this has been committed)