Page MenuHomePhabricator

RegAllocFast: Rewrite and improve
Needs ReviewPublic

Authored by MatzeB on Sep 12 2018, 7:29 PM.

Details

Summary

This rewrites big parts of the fast register allocator. The basic strategy of
doing block-local allocation hasn't changed but I tweaked several details:

  • Track register state on register units instead of physical registers. This simplifies and speeds up handling of register aliases.
  • Process basic blocks in reverse order: Definitions are known to end register livetimes when walking backwards (contrary when walking forward then uses may or may not be a kill so we need heuristics).
  • Check register mask operands (calls) instead of conservatively assuming everything is clobbered.
  • Enhance heuristics to detect killing uses: In case of a small number of defs/uses check if they are all in the same basic block and if so the last one is a killing use.
  • Enhance heuristic for copy-coalescing through hinting: We check the first k defs of a register for COPYs rather than relying on there just being a single definition.

When testing this on the full llvm test-suite including SPEC externals I measured:

  • average 5.1% reduction in code size for X86, 4.9% reduction in code on aarch64. (ranging between 0% and 20% depending on the test)
  • 0.5% faster compiletime (some analysis suggests the pass is slightly slower than before, but we more than make up for it because later passes are faster with the reduced instruction count)

I'm still running benchmarks and I need to fix 120 tests with hardcoded registers (ugh)...

Diff Detail

Repository
rL LLVM

Event Timeline

MatzeB created this revision.Sep 12 2018, 7:29 PM
arsenm added a subscriber: arsenm.Sep 13 2018, 2:42 AM
arsenm added inline comments.
lib/CodeGen/RegAllocFast.cpp
174–179

Should comment meaning / reason for these values?

304–310

Isn't this SkipPHIsAndLabels without the handling of isBasicBlockPrologue?

469

!VRegDef

480

static unnecessary?

557–558

return after else

Hi Matthias,

Thanks for modernizing this allocator.

Could you split the patch to make the review easier?

You mentioned several points in your description, if at all possible, it would be nice to have one patch for each.

Cheers,
-Quentin

Hi Matthias,

Thanks for modernizing this allocator.

Could you split the patch to make the review easier?

You mentioned several points in your description, if at all possible, it would be nice to have one patch for each.

Cheers,
-Quentin

I can try to do that for review purposes... But some things depend on each other so the intermediate steps are probably not going to be functional...

I'm now depending on this for a change I'm working on rather than figuring out how to fix kill flag handling

I see a few different assertions in tests with this patch. Did you see these before or are they regressions?

In a few ARM tests during ARMExpandPseudo:

Assertion failed: (!(isKill && isDef) && "Kill flag on def"), function CreateReg, file ../include/llvm/CodeGen/MachineOperand.h, line 76

In one test:
Assertion failed: ((LRI->LastUse == &MI || MO.isTied() || !isRegUsedInInstr(LRI->PhysReg)) && "assigned reg free"), function useVirtReg, file ../lib/CodeGen/RegAllocFast.cpp, line 723.

In a few different targets' tests:
Assertion failed: (MBB != &MBB->getParent()->front() && "no reload in start block"), function reloadAtBegin, file ../lib/CodeGen/RegAllocFast.cpp, line 320.

  • I am slowly adatpting all the tests to the changes here; I have a commit fixing ~100 of them but there is still maybe 30-40 to go.
  • I believe I have the first two asserts fixed locally here.
  • The 3rd assert is picking up invalid MIR (a value is used before it is defined), my local version now has a commandline flag that allows us to disable that assert for the couple tests that show invalid MIR (planning to file PRs for the target owners to fix, so we can remove the commandline flag once the last test is fixed).

I see a few different assertions in tests with this patch. Did you see these before or are they regressions?

In a few ARM tests during ARMExpandPseudo:

Assertion failed: (!(isKill && isDef) && "Kill flag on def"), function CreateReg, file ../include/llvm/CodeGen/MachineOperand.h, line 76

In one test:
Assertion failed: ((LRI->LastUse == &MI || MO.isTied() || !isRegUsedInInstr(LRI->PhysReg)) && "assigned reg free"), function useVirtReg, file ../lib/CodeGen/RegAllocFast.cpp, line 723.

In a few different targets' tests:
Assertion failed: (MBB != &MBB->getParent()->front() && "no reload in start block"), function reloadAtBegin, file ../lib/CodeGen/RegAllocFast.cpp, line 320.

Can you post the updated version?

MatzeB updated this revision to Diff 170799.Oct 23 2018, 6:13 PM

Updated version

I'm now depending on this for a change I'm working on rather than figuring out how to fix kill flag handling

I'm not sure what exactly you mean by that. But just as a warning: This patch will lead to more kill flags being added than before, but it does not guarantee that every last use has a kill flag (especially vregs live across blocks will still lack kill flags).

I'm now depending on this for a change I'm working on rather than figuring out how to fix kill flag handling

I'm not sure what exactly you mean by that. But just as a warning: This patch will lead to more kill flags being added than before, but it does not guarantee that every last use has a kill flag (especially vregs live across blocks will still lack kill flags).

This patch avoids a problem I'm having applying my partial allocation patch to RegAllocFast. It seems to have a bug now where the first time a physical register is used, it immediately sets kill on it even though there are further uses of the same def

MatzeB added a comment.Nov 6 2018, 3:05 PM

It seems to have a bug now where the first time a physical register is used, it immediately sets kill on it even though there are further uses of the same def

You mean for a pre-assigned physreg that looks something like this:
bb.0:

use $somereg ...

...

use $somereg ...

we add a kill flag on the first use? Will see if I figure this out, I'm surprised this didn't hit me in unittests then...

MatzeB updated this revision to Diff 172860.Nov 6 2018, 3:07 PM
  • Adapted all the unit tests
  • Slight tuning of register hinting code
  • Improved allocation heuristic for livethrough and earlyclobber registers, allowing us to allocate more inline assembly constructs consistently (rather than the previous allocator doing it by luck).

This passes ninja check-llvm for the first time.

Next steps:

  • Code cleanup
  • Splitting into multiple commits
  • Re-run test-suite benchmarks to gather correctness, compiletime, code size, and execution-speed metrics
arsenm added a comment.Nov 6 2018, 3:08 PM

It seems to have a bug now where the first time a physical register is used, it immediately sets kill on it even though there are further uses of the same def

You mean for a pre-assigned physreg that looks something like this:
bb.0:

use $somereg ...

...

use $somereg ...

we add a kill flag on the first use? Will see if I figure this out, I'm surprised this didn't hit me in unittests then...

Yes. It might have specifically been when it was a live-in. I don't see it with this patch, so I didn't spend too much time trying to understand why

@mips target owners: Please take a look at the unittests where I had to use -rafast-ignore-missing-defs to workaround invalid machine code. Those tests use a virtual register before it is defined (the old regalloc fast used to just ignore this in some cases).

atanasyan added a subscriber: abeserminji.

@abeserminji: Aleksandar, could you take a look at the RegAllocFast::reloadAtBegin? It looks like a necessity of using the rafast-ignore-missing-defs option is caused by the rL336328 commit.

@abeserminji: Aleksandar, could you take a look at the RegAllocFast::reloadAtBegin? It looks like a necessity of using the rafast-ignore-missing-defs option is caused by the rL336328 commit.

BTW: You don't even need to apply this patch or look at any of the changes here. Just the dump the machine code of the tests before regalloc might be enough to see which vreg has no definition.

I extracted a couple of obvious cleanups to reduce the code churn here. They landed in rL346287, rL346288, rL346289, rL346296, rL346297, rL346298
(will update and split the diff here soon)

MatzeB added a comment.Nov 9 2018, 4:41 PM
  • I split up two more NFC commits: rL346575 and rL346576.
  • I split several step by step patches to ease review (note that I did not bother to fix hundreds of testcases for the intermediate steps, so I will not commit them independently): D54364, D54365, D54366, D54367, D54368
MatzeB updated this revision to Diff 173474.Nov 9 2018, 4:42 PM
  • Simpler patch now that more things have been split up into separate review and NFC patches.
MatzeB added a comment.Nov 9 2018, 4:47 PM

Some notes on what is left here:

  • Change allocation direction to go from end to begin of a basic block.
  • Change debug value handling: The tricky thing with the changed direction is that we dealing with dangling DBG_VALUE instructions (they have a vreg use after the last non-debug use of a vreg) gets trickier. They are rare but occur in several unittests so we now track them separately in the code and perform a forward walk each time after the vreg is allocation by the real last use to see whether the register reached the dangling DBG_VALUE.
  • This also simplifies DBG_VALUE handling, to always referenvce stack slots in case of spills as we now place spills immediately adjacent to the definition of a value/register.
  • Generally rewrote allocateInstruction to be hopefolly somewhat simpler
  • Added some heuristic for tricky inline assembly early clobber situations: If we detect a tricky situation now, we will take all def and early-clobber like operands and sort them by the size of the register class so that we can assign tricky cases more consistently and are more likely to be successfull (this was necessary to fix some unittests that were lucky with the old allocation algorithm but unlucky with the new one. With the new heuristic we should be less dependent on luck).

From what I found in rL336328, I believe that the idea was to mark registers as implicit-def which would prevent such an error.
One of the comments say:

// The scratch registers here with the EarlyClobber | Define | Implicit
// flags is used to persuade the register allocator and the machine
// verifier to accept the usage of this register.

When I do --print-before=regallocfast I get these flags included:

...implicit-def dead early-clobber %10:gpr32

But I am not sure if that's enough, or if the new implementation is taking this into account.

arsenm added inline comments.Nov 16 2018, 12:04 PM
lib/CodeGen/RegAllocFast.cpp
414–415

This is dead code that also doesn't compile for me

arsenm added a comment.Jan 9 2019, 8:40 PM

ping, I'm depending on this set of patches for D55301

The asm clobber logic is failing for me on test/CodeGen/X86/2008-09-18-inline-asm-2.ll. It fails to allocate, and then the pass fails to not assert in the case of the reported out of registers error.

Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2019, 6:28 PM
Herald added a subscriber: jdoerfert. · View Herald Transcript
arsenm added inline comments.Mar 21 2019, 11:39 AM
lib/CodeGen/RegAllocFast.cpp
1050–1051

I was able to fix the failure I see by refining this to the number of defs of a class that may alias the current register's class, though it seems a bit overkill

arsenm added inline comments.Mar 21 2019, 1:50 PM
lib/CodeGen/RegAllocFast.cpp
321–323

This is problematic for AMDGPU. The order of the spills matters in the case the registers used are needed to restore the exec mask in the target block. Additionally, getMBBBeginInsertionPoint should be using SkipPHIsAndLabels so that isBasicBlockPrologue is respected.

Because of this the updates to test/CodeGen/AMDGPU/control-flow-fastregalloc.ll are wrong