Page MenuHomePhabricator

[fastalloc] Support allocate specific register class in fastalloc.
ClosedPublic

Authored by LuoYuanke on Jun 1 2022, 4:46 AM.

Details

Summary

The base RA support infrastructure that only allow a specific register
class be allocated in RA pss. Since greedy RA, basic RA derived from
base RA, they all allow allocating specific register class. Fast RA
doesn't support allocating register for specific register class. This
patch is to enable ShouldAllocateClass in fast RA, so that it can
support allocating register for specific register class.

Diff Detail

Event Timeline

LuoYuanke created this revision.Jun 1 2022, 4:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2022, 4:46 AM
LuoYuanke requested review of this revision.Jun 1 2022, 4:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2022, 4:46 AM

I thought I implemented this before. We do need to add some AMDGPU tests for this. @cdevadas recently noticed this was broken

LuoYuanke added a comment.EditedJun 1 2022, 5:57 AM

I thought I implemented this before. We do need to add some AMDGPU tests for this. @cdevadas recently noticed this was broken

It is great that you supported allocating specific register class before. Do you remember which lit test cast cover this feature? I'd would like to add similar test case for X86 target.

I think the idea of "split" special registers RA pass is good (especially for the registers who need to be config).
I notice all the "isVirtual<Register>" places need carefully to "exclude" the special registers.
Seems we may no need to do it if we can make sure that we handle these special registers before normal fast RA.
(because they have be allocated to physic registers)

I think the idea of "split" special registers RA pass is good (especially for the registers who need to be config).
I notice all the "isVirtual<Register>" places need carefully to "exclude" the special registers.
Seems we may no need to do it if we can make sure that we handle these special registers before normal fast RA.
(because they have be allocated to physic registers)

Maybe there is some misunderstanding. This patch is to allow fast RA not only allocate all virtual register but also allocate specific register (e.g. tile register). Tile register is still allocated by fast RA, but it is allocated alone, not mixed with other register class. We can observe pass pipeline in O0-pipeline.ll.

xiangzhangllvm added inline comments.Fri, Jun 17, 1:37 AM
llvm/lib/CodeGen/RegAllocFast.cpp
1161

I just take a look on this pass.
Seem this patch mainly to insert the condition where will allocated the virtual regs.

This check seems a little duplicated with MO.isReg().
Because we need to first make sure the MO is reg before we try to allocated for it.
(The only exception I find is handleDebugValue, it directly get reg for MI.getUsedDebugRegs() not MO. In fact, it also can be unified with MO)

I think we just need to replace all the "MO.isReg()" with a function which do the check of ShouldAllocateRegister

Some like:

ShouldAllocateForMO(MO){
  return MO.isReg() && ShouldAllocateRegister(MO.getReg());   
}

and keep your change in handleDebugValue (or unify the code with MO)

LuoYuanke added inline comments.Fri, Jun 17, 6:37 AM
llvm/lib/CodeGen/RegAllocFast.cpp
1161

Yes, this patch is to check the callback when allocate virtual registers, but I'm not sure it is a good idea to just add check the callback following MO.isReg(). First it may be physical register for a mahcine operand. It would crash if pass physical register to ShouldAllocateRegister(). Secondly unlikely RegAllocBase, the current infrastructure of RegAllocFast has much details in allocateInstruction().

xiangzhangllvm added inline comments.Sun, Jun 19, 7:29 PM
llvm/lib/CodeGen/RegAllocFast.cpp
1319

I think here should remove the check of "!ShouldAllocateRegister"

1419

I think here should remove the check of "!ShouldAllocateRegister"

LuoYuanke added inline comments.Sun, Jun 19, 11:20 PM
llvm/lib/CodeGen/RegAllocFast.cpp
1319

It seems keeping !ShouldAllocateRegister(Reg) can catch the unexpected bug by line 1404 where tile register has been allocated to tile physical register but it is still virtual register. I can add an additional assert here.

1419

It seems keeping !ShouldAllocateRegister(Reg) can catch the unexpected bug by line 1404 where tile register has been allocated to tile physical register but it is still virtual register. I can add an additional assert here.

llvm/lib/CodeGen/RegAllocFast.cpp
1177

PLS recheck the ShouldAllocateRegister for physic regs.
(We should not handle (reload/spill for) the un-expected physic regs in the pass.)

1419

That is well too.

LuoYuanke updated this revision to Diff 438366.Mon, Jun 20, 6:13 AM

Address Xiang's comments and add a test case to clobber tmm register.

xiangzhangllvm added inline comments.Mon, Jun 20, 5:53 PM
llvm/lib/CodeGen/RegAllocFast.cpp
1178

Are you handling here too ?
Here when we allocating tile registers (we don't want allocation/handle other non-tile registers). But the logic here is possible to handle other non-tile registers, for example we may reload for non-tile registers in definePhysReg.
This may not has correction problem, but it is very strange to handle the non-tile registers in tile register fast allocation. And even in non-AMX program, the problem still existed.

LuoYuanke updated this revision to Diff 438630.Tue, Jun 21, 3:59 AM

Fix AMDGPU test case failure.

LuoYuanke added inline comments.Tue, Jun 21, 4:02 AM
llvm/test/CodeGen/AMDGPU/collapse-endcf.ll
425

This is caused by 2 RA pass. Previous allocating specific register class doesn't work in fast RA.

llvm/test/CodeGen/AMDGPU/indirect-addressing-term.ll
43

Some register class is not allocated in the first fast RA pass.

LuoYuanke added inline comments.Tue, Jun 21, 4:39 AM
llvm/lib/CodeGen/RegAllocFast.cpp
1178

For physical register it is not filtered. I think the pass won't do anything for physical registers given there is no virtual register for the corresponding register class.

@arsenm , could you take a look at the AMDGPU test case?

cdevadas added inline comments.Wed, Jun 22, 12:52 AM
llvm/test/CodeGen/AMDGPU/collapse-endcf.ll
425

Yes, this is the right thing. The buffer_load_dword (the VGPR spill) should appear at the block prolog as they are inserted during 2 RA pass. In the left-hand side they are incorrectly inserted during the 1 RA pass which is supposed to allocate only SGPRs.

cdevadas added inline comments.Wed, Jun 22, 12:56 AM
llvm/test/CodeGen/AMDGPU/indirect-addressing-term.ll
43

That's right. VGPRs during the 2nd RA pass.

LuoYuanke updated this revision to Diff 438949.Wed, Jun 22, 2:33 AM

Add test case to stop after fast RA.

LGTM, however I think the x86 part should be split into a separate patch

xiangzhangllvm added inline comments.Wed, Jun 22, 5:47 PM
llvm/lib/CodeGen/RegAllocFast.cpp
1178

LGTM, Just a flaw here:
e.g. The GPRs' physic regs will be handled 2 times.

LuoYuanke updated this revision to Diff 439232.Wed, Jun 22, 6:19 PM

Remove X86 related code.

LuoYuanke edited the summary of this revision. (Show Details)Wed, Jun 22, 6:26 PM
This revision is now accepted and ready to land.Wed, Jun 22, 8:33 PM
LuoYuanke updated this revision to Diff 439255.Wed, Jun 22, 8:36 PM

Removed the change of X86 test case.

This revision was landed with ongoing or failed builds.Wed, Jun 22, 11:42 PM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Thu, Jun 23, 3:21 AM

Looks like this breaks check-clang: http://45.33.8.238/linux/79438/step_7.txt

Please take a look and revert for now if it takes a while to fix.

On many of the official bots too, eg https://lab.llvm.org/buildbot/#/builders/109/builds/41178 (did that not generate an email?)

On many of the official bots too, eg https://lab.llvm.org/buildbot/#/builders/109/builds/41178 (did that not generate an email?)

I'll take a look at it. If can't be solved soon, I would revert it first.

This also broke check-llvm on macOS: http://45.33.8.238/macm1/37938/step_11.txt (this both is an M1 mac, not sure if intel vs arm matter).

Please take a look at that too, and revert for now if it takes a while to fix.

(This failure was masked by the check-clang failure. But I locally bisected and verified that it's due to this change, and that locally reverting fixes the check-llvm failure.)

This also broke check-llvm on macOS: http://45.33.8.238/macm1/37938/step_11.txt (this both is an M1 mac, not sure if intel vs arm matter).

Please take a look at that too, and revert for now if it takes a while to fix.

(This failure was masked by the check-clang failure. But I locally bisected and verified that it's due to this change, and that locally reverting fixes the check-llvm failure.)

I don't have macOS environment, but from the log it seems some virtual register don't have a register class. Is there any full calltrace for the failure?

I also bisected an lldb failure down to this: https://lab.llvm.org/buildbot/#/builders/96/builds/25038/steps/6/logs/stdio

It has backtrace of a sort but it's probably not much help, I will get hold of the reproducer files.

I also bisected an lldb failure down to this: https://lab.llvm.org/buildbot/#/builders/96/builds/25038/steps/6/logs/stdio

It has backtrace of a sort but it's probably not much help, I will get hold of the reproducer files.

Thank you! I guess it was caused by line 306, I'll try getRegBankOrNull() instead of getRegClass().

llvm/lib/CodeGen/RegAllocFast.cpp
306

I'll use getRegClassOrNull() to replace getRegClass().

I was able to duplicate this issue with --global-isel. I'll come up with a patch to fix it.

arsenm added inline comments.Thu, Jun 23, 6:02 AM
llvm/lib/CodeGen/RegAllocFast.cpp
306

No, getRegClass should be used here. Virtual registers must have a class at this point

The reproducer:

LuoYuanke added inline comments.Thu, Jun 23, 6:07 AM
llvm/lib/CodeGen/RegAllocFast.cpp
306

In goable instruction selection is it possible there is no register class for a virtual register?

arsenm added inline comments.Thu, Jun 23, 6:08 AM
llvm/lib/CodeGen/RegAllocFast.cpp
306

Classes should have been set as part of the selection process

Given that the bot's been red for quite a few hours by now and there's some disagreement about the right fix, I've reverted this in 851a5efe45a026047ba8c0262a892b9895e355bf for now.

I create a patch (D129037) to fix the global ISel issue.