Page MenuHomePhabricator

[ARM][CodeGen] Enabling spilling of high registers in RegAllocFast for Thumb1
Needs ReviewPublic

Authored by pratlucas on Jun 2 2020, 6:27 AM.

Details

Summary

High registers - r8 until r12 - have no load/store instructions in
Thumb1 and, therefore, are not generally allocatable in Thumb1 mode.
When registers of this class are clobbered by inline assembly, though,
the Fast Register Allocator may be required to spill those registers.

In such conditions, the Fast Register Allocator hit an assertion failure
when trying to spill high registers to the satck through
TargetRegisterInfo's storeRegToStackSlot.

This patch enables the register allocator to use an auxiliary register
for the spill operation, introducing a new hook for the target inform
the proper register class to be used in loads/stores.

This patch is based on the discussion from D49364. The changes were written
from scratch, as there were significant changes to RegAllocFast since then.

Diff Detail

Event Timeline

pratlucas created this revision.Jun 2 2020, 6:27 AM
pratlucas updated this revision to Diff 267867.Jun 2 2020, 6:41 AM

Removing unecessary include and fixing formatting.

Harbormaster completed remote builds in B58756: Diff 267867.
thopre resigned from this revision.Jun 2 2020, 9:26 AM

This is intentionally not addressing greedy regalloc, I guess.

I think ConstantIslands currently assumes LR is never live in Thumb1 mode. Have you looked at that at all? (Last time I looked at this, there was also ThumbRegisterInfo::saveScavengerRegister, but that's gone now.)

llvm/lib/CodeGen/RegAllocFast.cpp
346

Is it possible to spuriously run out of registers? Say you have an inline asm that requires r0-r8. You allocate r0-r7, then try to allocate r8, then we run out of registers because r0-r7 were already reserved.

arsenm added inline comments.Jun 3 2020, 5:14 PM
llvm/lib/CodeGen/RegAllocFast.cpp
211

This needs rebasing on top of 66251f7e1de79a7c1620659b7f58352b8c8e892e

pratlucas updated this revision to Diff 268501.Jun 4 2020, 9:08 AM

Rebasing on top of rG66251f7e1de7.

pratlucas marked 2 inline comments as done.Jun 4 2020, 9:39 AM

This is intentionally not addressing greedy regalloc, I guess.

Yes. Currently, RegAllocFast is the only one that finds itself in need of spilling high registers. Other register allocators detect the interference between the inline asm inputs and the clobbered registers before allocating them, avoiding the spill.
If we ever want to make the high registers generally allocatable, though, the other allocators also need to be updated.

I think ConstantIslands currently assumes LR is never live in Thumb1 mode. Have you looked at that at all? (Last time I looked at this, there was also ThumbRegisterInfo::saveScavengerRegister, but that's gone now.)

From what I've checked, ConstantIslands assumes LR was spilled in the function epilog, so it is legal for it to make use of BL instructions.
This could only have an impact on explicit uses or clobbers of LR by the inline asm itself, which were already possible prior to this patch, so I believe it won't cause any issues.

llvm/lib/CodeGen/RegAllocFast.cpp
346

Yes, it is possible but in rare circumstances as the high registers aren't generally allocatable.

For this to happen the specified high register needs a reason to be spilled, so it should either be high on the allocation order or the inline asm should have enough inputs to cause it to be previously allocated.

As we can't directly spill it, the only way I see around this is to re-allocate the clobbered registers to another available register whenever possible. If you agree, I can handle this in a separate patch as it might be out of this one's scope.

arsenm added inline comments.Jun 4 2020, 12:00 PM
llvm/lib/CodeGen/RegAllocFast.cpp
662

Can you split these NFC changes into a separate patch? I want to avoid yet another rebase nightmare for D52010

1094

I don't think any of these new markRegUsedInInstr calls are correct

This could only have an impact on explicit uses or clobbers of LR by the inline asm itself, which were already possible prior to this patch, so I believe it won't cause any issues.

It won't cause any "new" issues, in some sense; it's probably feasible to write a testcase without this patch. But this patch probably makes it easier to trigger.

After taking a deeper look at ARMConstantIslandPass, I believe this patch won't actually cause any issues when interacting with it.

Please correct me if I'm wrong, but from what I've checked the pass's dependency on LR not being live is simply to allow it to replace unconditional branches that are out of range with BL instructions, which overrides the contents of LR.
Also, the pass doesn't seem to handle the contents of inline asm, keeping the user's assembly code as it is.

As LR is a high register, the only impact this patch has on it is allowing RegAllocFast to spill it when necessary - i.e. when its was intentionally clobbered by the user.
Clobbering it is the only way I see for LR to become live again and it was already a possibility before.

Note that LR is the last on the allocation order for high registers. Due to that, the only scenario I've found where we see a difference in behavior for the clobbering of LR would be when LR is required to handle the inline asm's inputs.
For example:

void constraint_h(int a, b, c, d, e, f) {
  asm volatile("mov lr, %5"
    :
    : "h" (a), "h" (b), "h" (c), "h" (d), "h" (e), "h" (f)
    : "lr");
}

Here, RegAllocFast previously hit the assertion failure and after the patch hits the following error:

error: inline assembly requires more registers than available

Other attempts to clobber LR are handled with no issues both before and after this patch.

pratlucas updated this revision to Diff 271113.Jun 16 2020, 8:53 AM

Splitting NFC changes into a separate patch.

pratlucas marked 4 inline comments as done.Jun 16 2020, 8:57 AM
pratlucas added inline comments.
llvm/lib/CodeGen/RegAllocFast.cpp
662

Done! NFC is at D81942.

1094

Those aren't actually new. I've added them here to replace the one removed from definePhysReg.

pratlucas edited reviewers, added: efriedma; removed: eli.friedman.Jun 16 2020, 8:58 AM

Please correct me if I'm wrong, but from what I've checked the pass's dependency on LR not being live is simply to allow it to replace unconditional branches that are out of range with BL instructions, which overrides the contents of LR.

Correct. The problem would be if lr were live across a branch. If an inline asm has lr as an input or output, lr has a live range which could contain a branch. Even if there aren't any such branches when regalloc runs, later transforms can introduce them. (ConstantIslands itself can introduce branches, but I'm not sure if those branches would ever need to be relaxed.)

It's definitely realistic to construct a testcase at higher optimization levels. Thinking about it a bit more, I'm having trouble coming up with a testcase that would trigger at -O0 specifically, though. It would be difficult to construct a testcase where lr is alive across a branch during isel, given the way inline asm operands are constructed, and the post-isel optimizations are very limited.

Also, the pass doesn't seem to handle the contents of inline asm, keeping the user's assembly code as it is.

Correct.