This patch contains fix for the bug https://llvm.org/bugs/show_bug.cgi?id=28144 related to register usage calculation, previously it was not considering callee RegMasks at callsites in the Machinefunction thus generating wrong register usage information.
Details
Diff Detail
Event Timeline
lib/CodeGen/RegUsageInfoCollector.cpp | ||
---|---|---|
119 | I have added this code after marking callee saved registers as preserved because if any callee inside the MF body is using the register , which has marked safe above then that will be correct register usage information. |
Also, did you run the lit tests before submitting? I think this may break one of your original tests.
lib/CodeGen/AsmPrinter/AsmPrinter.cpp | ||
---|---|---|
671 | No need for extra curly brackets. |
lib/CodeGen/RegUsageInfoCollector.cpp | ||
---|---|---|
119 | Can you take an example? I don't make sense of it. |
@mcrosier thanks for pointing test case failure as now RegMask calculation changed test case needs to be updated. Actually lnt test-suite passed all test so I just forget to run lit tests.
lib/CodeGen/RegUsageInfoCollector.cpp | ||
---|---|---|
119 | @mehdi_amini This is not real example but this is what I have thought of when writing this code. Consider foo() sets R15 as per the calling convention as preserved register but one of its callee lat say bar() actually clobbers R15 then foo() will mark R15 as clobbered. I think you may have better reason. |
lib/CodeGen/RegUsageInfoCollector.cpp | ||
---|---|---|
119 | Isn't your example showing exactly why it is not the right ordering? |
lib/CodeGen/RegUsageInfoCollector.cpp | ||
---|---|---|
119 | @mehdi_amini Do you mean that if foo()'s CC preserves R15 , it would be saved by foo() so no problem if foo's caller is using R15 ? |
test/CodeGen/X86/ipra-reg-usage.ll | ||
---|---|---|
6 ↗ | (On Diff #60946) | foo has preserve_allcc but it is clobbering a lot of registers? Is it what you expect? |
The MachineFunction's CC is used to optimize number of clobbered register after callee RegMask has considered for calculation. The correctness of this order is tested with test-suite and nothing breaks due to this.
lib/CodeGen/AsmPrinter/AsmPrinter.cpp | ||
---|---|---|
661–675 | This part seems independent to me and could go into a separate commit. | |
lib/CodeGen/RegUsageInfoCollector.cpp | ||
113–126 | Note that the register allocator already computes this information. You can simply use something like MachineRegisterInfo::getUsedPhysRegsMask() to get it. |
lib/CodeGen/RegUsageInfoCollector.cpp | ||
---|---|---|
113–126 | @MatzeB I tried out MachineRegisterInfo::getUsedPhysRegsMask() instead of this loops, that produces more strict regmask due to that no additional failures have been noted in test-suites but around 300 test cases have performance regression also consider following test case: ; RUN: llc -enable-ipra -print-regusage -o /dev/null 2>&1 < %s | FileCheck %s target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-apple-macosx10.12.0" ; Verify that bar does not clobber anything ; CHECK-NOT: bar Clobbered Registers:{{.+}} ; CHECK: bar Clobbered Registers: define void @bar() #0 { ret void } ; Verifies that inline assembly is correctly handled by giving a list of clobbered registers ; CHECK: foo Clobbered Registers: AH AL AX CH CL CX DI DIL EAX ECX EDI RAX RCX RDI define void @foo() #0 { call void asm sideeffect "", "~{eax},~{ecx},~{edi}"() #0 ret void } attributes #0 = { nounwind } now function foo clobbers, AH AL AX CH CL CS CX DH DI DIL DL DS DX EAX ECX EDI EDX EFLAGS EIP EIZ ES ESI ESP FPSW FS GS IP RAX RCX RDI RDX RIP RIZ RSI RSP SI SIL SP SPL SS BND0 BND1 BND2 BND3 CR0 CR1 CR2 CR3 CR4 CR5 CR6 CR7 CR8 CR9 CR10 CR11 CR12 CR13 CR14 CR15 DR0 DR1 DR2 DR3 DR4 DR5 DR6 DR7 DR8 DR9 DR10 DR11 DR12 DR13 DR14 DR15 FP0 FP1 FP2 FP3 FP4 FP5 FP6 FP7 K0 K1 K2 K3 K4 K5 K6 K7 MM0 MM1 MM2 MM3 MM4 MM5 MM6 MM7 R8 R9 R10 R11 ST0 ST1 ST2 ST3 ST4 ST5 ST6 ST7 XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 XMM9 XMM10 XMM11 XMM12 XMM13 XMM14 XMM15 XMM16 XMM17 XMM18 XMM19 XMM20 XMM21 XMM22 XMM23 XMM24 XMM25 XMM26 XMM27 XMM28 XMM29 XMM30 XMM31 YMM0 YMM1 YMM2 YMM3 YMM4 YMM5 YMM6 YMM7 YMM8 YMM9 YMM10 YMM11 YMM12 YMM13 YMM14 YMM15 YMM16 YMM17 YMM18 YMM19 YMM20 YMM21 YMM22 YMM23 YMM24 YMM25 YMM26 YMM27 YMM28 YMM29 YMM30 YMM31 ZMM0 ZMM1 ZMM2 ZMM3 ZMM4 ZMM5 ZMM6 ZMM7 ZMM8 ZMM9 ZMM10 ZMM11 ZMM12 ZMM13 ZMM14 ZMM15 ZMM16 ZMM17 ZMM18 ZMM19 ZMM20 ZMM21 ZMM22 ZMM23 ZMM24 ZMM25 ZMM26 ZMM27 ZMM28 ZMM29 ZMM30 ZMM31 R8B R9B R10B R11B R8D R9D R10D R11D R8W R9W R10W R11W foo does not have any function calls so I don't understand this result. |
test/CodeGen/X86/ipra-reg-usage.ll | ||
---|---|---|
6 ↗ | (On Diff #61166) | The callee saved registers as per CC has been added after function calls clobbered regmask has been considered. The clobbered registers are less compare to previous order. |
test/CodeGen/X86/ipra-reg-usage.ll | ||
---|---|---|
6 ↗ | (On Diff #61166) | This is correct detail , and here R11 is marked clobbered as preserve_allcc does not preserve R11, it can be verified as follow: // All GPRs - except r11 def CSR_64_RT_MostRegs : CalleeSavedRegs<(add CSR_64, RAX, RCX, RDX, RSI, RDI, R8, R9, R10, RSP)>; // All registers - except r11 def CSR_64_RT_AllRegs : CalleeSavedRegs<(add CSR_64_RT_MostRegs, (sequence "XMM%u", 0, 15))>; |
test/CodeGen/X86/ipra-reg-usage.ll | ||
---|---|---|
6 ↗ | (On Diff #61166) | Why do we clobber XMM* YMM* and ZMM*? Is it also expected? |
test/CodeGen/X86/ipra-reg-usage.ll | ||
---|---|---|
6 ↗ | (On Diff #61166) | YMM* and ZMM* are clobbered due to reg mask of bar1 and bar2 has them clobbered. |
test/CodeGen/X86/ipra-reg-usage.ll | ||
---|---|---|
6 ↗ | (On Diff #61166) | My question is more about the "preserve_allcc". |
test/CodeGen/X86/ipra-reg-usage.ll | ||
---|---|---|
6 ↗ | (On Diff #61166) | as you can see CSR_64_RT_MostRegs do not add YMM* or ZMM* so preserve_allcc does not preserve them. |
test/CodeGen/X86/ipra-reg-usage.ll | ||
---|---|---|
6 ↗ | (On Diff #61166) | again preserve_allcc preserves XMM0 to XMM15 that can be seen from CSR_64_RT_AllRegs but it does not preserve XMM16 onwards |
lib/CodeGen/RegUsageInfoCollector.cpp | ||
---|---|---|
113–126 | Why do you say so? The registers clobbered by the callees is exactlye what the UsedPhysRegMask in MachineRegisterInfo is about, see also the use of addPhysRegsUsedFromRegMask() in VirtRegMap.cpp/FastRegAlloc.cpp. |
lib/CodeGen/RegUsageInfoCollector.cpp | ||
---|---|---|
113–126 | // Code snippet 1 for (MachineBasicBlock &MBB : MF) { for (MachineInstr &MI : MBB) { if (!MI.isCall()) continue; for (MachineOperand &MO : MI.operands()) { if (!MO.isRegMask()) continue; const uint32_t *CalleeRegMask = MO.getRegMask(); //DEBUG(dbgs() << "Callee RegMask : \n"); //printRegMask(TRI, CalleeRegMask); for (unsigned i = 0; i < RegMaskSize; ++i) RegMask[i] = RegMask[i] & CalleeRegMask[i]; break; } } } // Code snippet 2 BitVector UsedPhysRegMask = MRI->getUsedPhysRegsMask(); for (unsigned i = 0; i < RegMaskSize; ++i) RegMask[i] = RegMask[i] & UsedPhysRegMask[i]; @MatzeB consider following simple test case for comparing Code snippet 1 (CS1) target triple = "x86_64-unknown-unknown" define void @bar1() { ret void } define preserve_allcc void @foo()#0 { call void @bar1() call void @bar2() ret void } define void @bar2() { ret void } attributes #0 = {nounwind} with CS1 for this test case bar1 and bar2 does not clobbered any register how ever now as with CS1 we are getting correct code for test-suite I don't think we should |
I think now this is in good shape. I don't have commit access so if this is ok now then please commit it.
test/CodeGen/X86/ipra-reg-usage.ll | ||
---|---|---|
6 ↗ | (On Diff #61424) | I am really confused with this result, because locally I added a print method to see which regs are clobbered in mask and I verified the output. But it seems I need to dig more about this this indicates perhaps with 2 loop approach I have missed something |
test/CodeGen/X86/ipra-reg-usage.ll | ||
---|---|---|
6 ↗ | (On Diff #61424) | If you're confused with this result, you shouldn't ask us to commit this patch, but should first make sure we all have a good understanding of what's going on here. |
test/CodeGen/X86/ipra-reg-usage.ll | ||
---|---|---|
6 ↗ | (On Diff #61424) | I am looking into it. |
@mehdi_amini ,@MatzeB
Here is my analysis:
BitVector UsedPhysRegMask = MRI->getUsedPhysRegsMask(); for (unsigned i = 0; i < RegMaskSize; ++i) RegMask[i] &= ~UsedPhysRegMask[i];
The above code is buggy because UsedPhysRegMask is indexed by register no where
as RegMask is vector integers in which each bit represenet status of one register.
I am sorry for that, I assumed that in LLVM all regmask are following same representation.
// Code snippet 1 (CS1) for (unsigned PReg = 1, PRegE = TRI->getNumRegs(); PReg < PRegE; ++PReg) if (!MRI->reg_nodbg_empty(PReg) && MRI->isPhysRegUsed(PReg)) markRegClobbered(TRI, &RegMask[0], PReg);
// Code snippet 2 (CS2) for (unsigned PReg = 1, PRegE = TRI->getNumRegs(); PReg < PRegE; ++PReg) if (MRI->isPhysRegUsed(PReg)) markRegClobbered(TRI, &RegMask[0], PReg);
Now actaully CS2 is sufficient and correct code for calculating regmask. And CS1
is having a very subtle bug which caused incorrect regmask (such as not marking
R11 as clobbered). Now why CS2 is correct and sufficient (I think @MatzeB has
pointed this fact earlier) it is because MRI::isPhysRegUsed actaully uses
UsedPhysRegMask so no need to consider that details again and MRI::isPhysRegUsed
also uses reg_nodbg_empty.
Now what is the problem with CS1, due to !MRI->reg_nodbg_empty(PReg) condition
fails when MF is not using PReg and MRI->isPhysRegUsed(PReg) would never execute.
thus generating wrong regmask. So why it happens so. Consider following definition
of reg_nodbg_empty and reg_nodbg_begin, it uses getRegUseDefListHead and if given
register is not used in side a MF then it will return true (I think it should not
Isn't reg_nodbg_empty return false if Reg is not used in MF at all?)
and due to that !MRI->reg_nodbg_empty(PReg) will be false.
/// reg_nodbg_empty - Return true if the only instructions using or defining /// Reg are Debug instructions. bool reg_nodbg_empty(unsigned RegNo) const { return reg_nodbg_begin(RegNo) == reg_nodbg_end(); } reg_nodbg_iterator reg_nodbg_begin(unsigned RegNo) const { return reg_nodbg_iterator(getRegUseDefListHead(RegNo)); }
What are your thoughts? Have I missed any thing?
@mehdi_amini @MatzeB test-suite shows some new runtime failures, these failures are different compare to previous time i.e test-suite/SingleSource/Benchmarks/Stanford/Towers.c compiles and executes fine but SingleSource/Benchmarks/Misc/fbench and some other fails.
@mehdi_amini @MatzeB the failures mentioned in above comment is due to local function related changes (setting regmask to none) I am looking more into this. @MatzeB please share your views on this.
ping! Also as I don't have commit access, who ever is going to commit this in future please commit this change before committing http://reviews.llvm.org/D21561
lib/CodeGen/MachineRegisterInfo.cpp | ||
---|---|---|
511 ↗ | (On Diff #62105) | Do you have a test for that? |
lib/CodeGen/MachineRegisterInfo.cpp | ||
---|---|---|
511 ↗ | (On Diff #62105) | No this not fix for local function related bug, but this is for code reusability. As per discussion with MtazeB and Hal method to calculate register usage is same except isNoReturnDef is not required. So we decide to add a parameter with optional value. I don't think we require test case for this. |
I don't have commit access, please some one commit this if it does not require rebase.
No need for extra curly brackets.