This is an archive of the discontinued LLVM Phabricator instance.

[MRI] isConstantPhysReg should also check if the register is clobbered by a RegMask
Needs ReviewPublic

Authored by Carrot on Aug 1 2022, 11:00 AM.

Details

Reviewers
gberry
arsenm
Summary

Current implementation of MachineRegisterInfo::isConstantPhysReg checks if the specified physical is modified in a function, but it only checks if it is used in a Def register operand, it may still be modified in a RegMask operand, like a function call. So it can give wrong result in this situation.

Other changes are used to prevent test regressions.

Diff Detail

Event Timeline

Carrot created this revision.Aug 1 2022, 11:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 1 2022, 11:00 AM
Carrot requested review of this revision.Aug 1 2022, 11:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 1 2022, 11:00 AM
arsenm added inline comments.Aug 1 2022, 11:10 AM
llvm/lib/CodeGen/MachineRegisterInfo.cpp
591

This can just be a range loop. Also braces

llvm/lib/CodeGen/RegisterCoalescer.cpp
4116 ↗(On Diff #449070)

Adding a specific collection step here feels wrong. Other contexts the mask is added as soon as it's created. Not sure why we have the virtregrewriter adding these now

llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp
104 ↗(On Diff #449070)

Don't see why this would skip regmask tests

llvm/lib/Target/Mips/MipsRegisterInfo.cpp
322–323 ↗(On Diff #449070)

This should be a separate mips change

Carrot added inline comments.Aug 1 2022, 1:32 PM
llvm/lib/CodeGen/RegisterCoalescer.cpp
4116 ↗(On Diff #449070)

I agree it's ugly. Do you have any good idea where I can collect the RegMasks?

llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp
104 ↗(On Diff #449070)

To be honest, the MRI change causes two gpu tests failed, because previously there was no RegMask information collected, the checking of RegMask in isPhysRegUsed was actually nop. Now with RegMask collected, isPhysRegUsed behaves differently. I am completely ignorant with AMD GPU, so I just kept its original behavior by skipping regmask test.

Do you think it's better to enable the regmask test and send out the modified tests for review?

llvm/lib/Target/Mips/MipsRegisterInfo.cpp
322–323 ↗(On Diff #449070)

I sent out D130932 for it.

It's been a while since I looked at constant registers but I made a few changes in CHERI LLVM to avoid unnecessary copies. I just had a quick look at some of the commits and https://github.com/CTSRD-CHERI/llvm-project/commit/8588d8b81458ed6d87b674893e7752e6a6915574 looks somewhat related.

Uploaded the patch I committed to CHERI LLVM as https://reviews.llvm.org/D131958.

Can this be closed with D131958?

Can this be closed with D131958?

I think so (about to commit that revision once I check tests still pass), but it's hard to say if it covers all cases this revision does since there are no tests.

Carrot marked an inline comment as done.Sep 20 2022, 8:01 AM

Can this be closed with D131958?

I think so (about to commit that revision once I check tests still pass), but it's hard to say if it covers all cases this revision does since there are no tests.

No, a phyiscal register is not a target constant register, but if it is not modified inside a specific function, it can still be treated as a constant register by MachineRegisterInfo::isConstantPhysReg inside the specific function. The modification should include both defined register operand and regmask.

I will update this patch. But it still depends on D132987 and D132561 to prevent regressions. These are real problems exposed by this patch.

Carrot updated this revision to Diff 462551.Sep 23 2022, 11:04 AM

Update the patch to collect RegMask information whenever a new RegMask operand is created or its content is changed.

arsenm added inline comments.Sep 23 2022, 1:27 PM
llvm/lib/CodeGen/RegUsageInfoPropagate.cpp
147–149

Won't this be reflected in the regmask updates (BTW I think this pass needs to be changed to directly modify the regmasks in the instruction instead of the analysis)?

Carrot added inline comments.Sep 23 2022, 1:59 PM
llvm/lib/CodeGen/RegUsageInfoPropagate.cpp
147–149

Query register information through MRI is precise through the linked Register operands. But RegMask is different, all RegMask information is cumulated in MRI.UsedPhysRegMask. So usually we can only add new used physical registers into UsedPhysRegMask. This pass is used to reduce used physical registers, we need to reset UsedPhysRegMask and collect RegMask again.

nhaehnle removed a subscriber: nhaehnle.Oct 6 2022, 2:47 AM
Carrot updated this revision to Diff 471353.Oct 27 2022, 6:38 PM

Any other comments?

foad added a comment.Oct 28 2022, 2:19 AM

Why are there so many changes in AMDGPU codegen, even in tests that do not contain function calls?

Why are there so many changes in AMDGPU codegen, even in tests that do not contain function calls?

In a previous comment I mentioned I'm completely ignorant with AMDGPU, the instructions looks very different when compared with traditional CPU. So it's better to have an AMDGPU expert to have a look at these changes.

I tried to debug the simplest one wwm-reserved.ll. The difference comes from pass si-pre-allocate-wwm-regs, the related code is

...
160B      dead $sgpr30_sgpr31 = SI_CALL %8:sreg_64, @called, <regmask $noreg $exec $exec_hi $exec_lo $flat_scr $flat_scr_hi $flat_scr_hi_ci $flat_scr_hi_vi $flat_scr_lo $flat_scr_lo_ci $flat_scr_lo_vi $flat_scr_ci $flat_scr_vi $fp_reg $lds_direct $mode $pc_reg $private_rsrc_reg $scc $sgpr_null $sgpr_null_hi $sp_reg $src_execz $src_pops_exiting_wave_id $src_private_base $src_private_limit $src_scc $src_shared_base $src_shared_limit $src_vccz $tba $tba_hi $tba_lo and 6831 more...>, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $vgpr0, implicit-def $vgpr0
168B      %10:vgpr_32 = COPY $vgpr0
...

When this pass processing the COPY instruction, function SIPreAllocateWWMRegs::processDef tries to find a physical register for virtual register %10. It tries every physical register in class vgpr_32 to see if it fits.

...
  for (MCRegister PhysReg : RegClassInfo.getOrder(MRI->getRegClass(Reg))) {
    if (!MRI->isPhysRegUsed(PhysReg) &&                                                                             // Different result
        Matrix->checkInterference(LI, PhysReg) == LiveRegMatrix::IK_Free) {
      Matrix->assign(LI, PhysReg);
      assert(PhysReg != 0); 
      RegsToRewrite.push_back(Reg);
      return true;
    }   
  }
...

When PhysReg is $vgpr1, MRI->isPhysRegUsed(PhysReg) computes different result because of this patch. Function MRI->isPhysRegUsed checks regmask before register operands. But regmask information was not collected until very late, so at this time MRI doesn't have any regmask information, and there is no explicit $vgpr1 in the function, so MRI->isPhysRegUsed returns true for $vgpr1, and it is assigned to %10.

With this patch, MRI regmask information is updated whenever there is a regmask change, and recollected in reg-usage-propagation pass. So when $vgpr1 is passed to MRI->isPhysRegUsed, false is returned because $vgpr1 is used in @called and propagated to MRI in current function, it means $vgpr1 is not appropriate for %10, and next physical register $vgpr2 is tried.

Although the original behavior is correct for this specific function, but it is very dangerous. If we have another function call to @called inside the live range of %10, wrong code can be generated.

foad added inline comments.Nov 1 2022, 4:50 AM
llvm/lib/CodeGen/MachineCSE.cpp
280 ↗(On Diff #471353)

Can you please split this part out into a separate patch for review? It causes significant codegen changes for AMDGPU and we need to be sure that the changes are OK.

llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll
1 ↗(On Diff #471353)

I don't understant why these diffs are included in the patch. Can you double check? I don't think your patch affects codegen for any of:

LLVM :: CodeGen/AMDGPU/GlobalISel/udiv.i64.ll
LLVM :: CodeGen/AMDGPU/GlobalISel/urem.i64.ll
LLVM :: CodeGen/AMDGPU/sgpr-control-flow.ll
Carrot added a comment.Nov 1 2022, 7:06 PM

@foad, thanks a lot for reviewing AMDGPU changes!

llvm/lib/CodeGen/MachineCSE.cpp
280 ↗(On Diff #471353)

It is split to D137222. I also did an analysis for it.

llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll
1 ↗(On Diff #471353)

Will investigate it.

Carrot added inline comments.Nov 3 2022, 2:42 PM
llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll
1 ↗(On Diff #471353)

They are also caused by the change in MachineCSE. They don't show up in D137222 because there are other changes in MachineCSE::isPRECandidate between the two versions of my two clients. They will disappear after rebaseing.

Carrot updated this revision to Diff 475580.Nov 15 2022, 2:26 PM

Rebase this patch after committing the MachineCSE part.

Carrot added a comment.Dec 5 2022, 3:27 PM

Any other comments?

Carrot updated this revision to Diff 482263.Dec 12 2022, 2:08 PM

Add a test case to show that llvm can generate wrong instructions without this patch.

Carrot updated this revision to Diff 484094.Dec 19 2022, 3:02 PM

Rebase the code.

The new failed test pr59305.ll is another case that wrong code was generated without this patch.

Several DIV instructions crossing function call to fesetround got CSEd, fesetround is used to change rounding mode, so these DIV instructions may generate different results, they can't be CSEd.