This is an archive of the discontinued LLVM Phabricator instance.

[fastregalloc] Enhance the heuristics for liveout in self loop.
ClosedPublic

Authored by LuoYuanke on May 6 2022, 12:49 AM.

Details

Summary

For below case, virtual register is defined twice in the self loop. We
don't need to spill %0 after the third instruction %0 = def (tied %0),
because it is defined in the second instruction %0 = def.

1 bb.1
2 %0 = def
3 %0 = def (tied %0)
4 ...
5 jmp bb.1

Diff Detail

Event Timeline

LuoYuanke created this revision.May 6 2022, 12:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 6 2022, 12:49 AM
LuoYuanke requested review of this revision.May 6 2022, 12:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 6 2022, 12:49 AM
LuoYuanke updated this revision to Diff 427586.May 6 2022, 3:32 AM

Fix test case failure.

LuoYuanke updated this revision to Diff 427588.May 6 2022, 3:36 AM

Fix typo in commends.

MatzeB requested changes to this revision.May 9 2022, 9:25 AM

FWIW: This is adding another case of quadratic runtime in the amount of instructions in the basic block: We can potentially hit the check for every instruction in the basic block and then use dominates and end up scanning over the majority of the basic block again. We have to be careful because the fast regallocators primary quality is being fast, being good comes second.

That said the changes in the tests indicate that this is common enough that it may be worth having?

Either way in the current form this seems incorrect to me (see comment).

llvm/lib/CodeGen/RegAllocFast.cpp
363–368

This is ignoring any definitions outside of MBB but we really have to abort for correctness, don't we?

This revision now requires changes to proceed.May 9 2022, 9:25 AM

Only loosely relevant to this patch, but it seems sort of silly to me to do all this work to reconstruct information that we already had before PHIElimination.

MatzeB added a comment.May 9 2022, 1:01 PM

Only loosely relevant to this patch, but it seems sort of silly to me to do all this work to reconstruct information that we already had before PHIElimination.

Well the natural thing to do would be a liveness analysis, but I'm also not sure that is in the spirit of regallocfast.

LuoYuanke updated this revision to Diff 428346.May 10 2022, 5:16 AM

Address Braun's comments and manully change case
fastregalloc-self-loop-heuristic.mir.

FWIW: This is adding another case of quadratic runtime in the amount of instructions in the basic block: We can potentially hit the check for every instruction in the basic block and then use dominates and end up scanning over the majority of the basic block again. We have to be careful because the fast regallocators primary quality is being fast, being good comes second.

That said the changes in the tests indicate that this is common enough that it may be worth having?

The addtionaly check only happens when it is self loop block, and there are multiple def in current MBB, so I think it won't cause the compiling-time regression for fastregalloc. I am developing regalloc for AMX registers (D125075), it would spill the AMX register before fastregalloc, but here fastregalloc spill it again, so I'd like to improve it.

llvm/lib/CodeGen/RegAllocFast.cpp
363–368

Update the patch, if the MBB of def is not the same to current MBB mark the VirtReg may liv acorss blocks.

MatzeB accepted this revision.Jun 20 2022, 8:22 AM

LGTM

This revision is now accepted and ready to land.Jun 20 2022, 8:22 AM
This revision was landed with ongoing or failed builds.Jun 20 2022, 6:19 PM
This revision was automatically updated to reflect the committed changes.