This is an archive of the discontinued LLVM Phabricator instance.

RegAllocGreedy: Fix last chance recolor assert in impossible case
ClosedPublic

Authored by arsenm on Feb 16 2022, 1:26 PM.

Details

Summary

This example is not compilable without handling eviction of specific
subregisters. Last chance recoloring was deciding it could try
evicting an overlapping superregister, which doesn't help make any
progress.The LiveIntervalUnion would then assert due to an overlapping
/ identical range when trying the new assignment.

Unfortunately this is also producing a verifier error after the
allocation fails. I've seen a number of these, and not sure if we
should just start deleting the function on error rather than trying to
figure out how to put together valid MIR.

I'm not super confident this is the right place to fix this. I also
have a number of failing testcases I need to fix by handling partial
evictions of superregisters.

Diff Detail

Event Timeline

arsenm created this revision.Feb 16 2022, 1:26 PM
arsenm requested review of this revision.Feb 16 2022, 1:26 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2022, 1:26 PM
Herald added a subscriber: wdng. · View Herald Transcript
qcolombet accepted this revision.Feb 16 2022, 1:58 PM

The fix looks good to me.

Out-of-curiosity, do you hit the last recoloring heuristic often?

This revision is now accepted and ready to land.Feb 16 2022, 1:58 PM

The fix looks good to me.

Out-of-curiosity, do you hit the last recoloring heuristic often?

Not sure but I've been seeing a lot of failures recently in cases like this where it would only be satisfiable if superregisters can be partially spilled

arsenm closed this revision.Feb 17 2022, 3:31 PM

effc65eb1d167d6e6f9b8340cbb3b5ef6e2302a8

I haven't looked into it yet, but we see cases of "ran out of registers during register allocation" with this patch, for code that compiled succesfully before.

I haven't looked into it yet, but we see cases of "ran out of registers during register allocation" with this patch, for code that compiled succesfully before.

Looked a bit at one failure. It's for our out of tree target so I can't share a reproducer but I'll try to roughly describe what is happening.

Our machine has 16 registers in the "A" regclass, a0-a7 and b0-b7. The registers can be address individually as a0, a1, ..., b7.
The registers can also be accessed as quads, e.g. a3210 is made up of a3, a2, a1 and a0.
There are only 4 quads: a3210, a7654, b3210 and b7654. We have a regclass "Aquads" for them.

In a failing example we have a bundle like this with heavy register use:

BUNDLE [...] {
  inst1 %234:Aquad
  inst1 %235:Aquad
  undef %235.sub0:Aquad = inst2 %211:A, %237.sub0:Aquad
  undef %235.sub1:Aquad = inst2 %226:A, %237.sub1:Aquad
  undef %235.sub2:Aquad = inst2 %229:A, %237.sub0:Aquad
  undef %235.sub3:Aquad = inst2 %232:A, %237.sub1:Aquad
}

It uses 3 Aquad and 4 A registers as input, so in total it uses 16 registers, which is exactly what we have.

When we reach last chance recoloring for %237:Aquad, we have so far done the following allocations for the vregs in the hard bundle:

[%234 -> $a3210] Aquad
[%232 -> $a4] A
[%235 -> $b7654] Aquad
[%211 -> $b0] A
[%229 -> $b2] A
[%226 -> $b3] A

So a5, a6, a7 and b1 are not used in the bundle, but we need a quad. I.e. we have a sufficient number of available registers for a quad, but they are fragmented.

Without the patch, the allocator goes on and tries to free up b3210, unassigning b0, b2 and b3 from %211, %229 and %226.
After some more work it finally succeeds.

But with this patch it stops when examining the interfering range

%211 [912r,960r:0) 0@912r  weight:INF

since b0 (which is assigned to %211) overlaps with b3210, due to the new code

TRI->regsOverlap(VRM->getPhys(Intf->reg()), PhysReg)

and then we end up with "ran out of registers".
So it really looks like the new check is too pessimistic in some way, making the allocator fail in cases where it previously could find an allocation.

In a failing example we have a bundle like this with heavy register use:

BUNDLE [...] {
  inst1 %234:Aquad
  inst1 %235:Aquad
  undef %235.sub0:Aquad = inst2 %211:A, %237.sub0:Aquad
  undef %235.sub1:Aquad = inst2 %226:A, %237.sub1:Aquad
  undef %235.sub2:Aquad = inst2 %229:A, %237.sub0:Aquad
  undef %235.sub3:Aquad = inst2 %232:A, %237.sub1:Aquad
}

Can you fill out more context with the surrounding instructions + ranges? Theoretically I can reconstruct the same scenario using -stress-regalloc

In a failing example we have a bundle like this with heavy register use:

BUNDLE [...] {
  inst1 %234:Aquad
  inst1 %235:Aquad
  undef %235.sub0:Aquad = inst2 %211:A, %237.sub0:Aquad
  undef %235.sub1:Aquad = inst2 %226:A, %237.sub1:Aquad
  undef %235.sub2:Aquad = inst2 %229:A, %237.sub0:Aquad
  undef %235.sub3:Aquad = inst2 %232:A, %237.sub1:Aquad
}

Can you fill out more context with the surrounding instructions + ranges? Theoretically I can reconstruct the same scenario using -stress-regalloc

So far by guessing I've only seen cases that fail to compile either way. However I'm guessing what we need is the assigned register is a strict superset of the interfering register

I haven't looked into it yet, but we see cases of "ran out of registers during register allocation" with this patch, for code that compiled succesfully before.

I was wondering about that.
I think it could happen if each element of the super register is recolored as well. I.e., at the one point in the recoloring the super register becomes fully available.

In a failing example we have a bundle like this with heavy register use:

BUNDLE [...] {
  inst1 %234:Aquad
  inst1 %235:Aquad
  undef %235.sub0:Aquad = inst2 %211:A, %237.sub0:Aquad
  undef %235.sub1:Aquad = inst2 %226:A, %237.sub1:Aquad
  undef %235.sub2:Aquad = inst2 %229:A, %237.sub0:Aquad
  undef %235.sub3:Aquad = inst2 %232:A, %237.sub1:Aquad
}

Can you fill out more context with the surrounding instructions + ranges? Theoretically I can reconstruct the same scenario using -stress-regalloc

So far by guessing I've only seen cases that fail to compile either way. However I'm guessing what we need is the assigned register is a strict superset of the interfering register

Does it work if you change regsOverlap to isSuperRegisterEq?

In a failing example we have a bundle like this with heavy register use:

BUNDLE [...] {
  inst1 %234:Aquad
  inst1 %235:Aquad
  undef %235.sub0:Aquad = inst2 %211:A, %237.sub0:Aquad
  undef %235.sub1:Aquad = inst2 %226:A, %237.sub1:Aquad
  undef %235.sub2:Aquad = inst2 %229:A, %237.sub0:Aquad
  undef %235.sub3:Aquad = inst2 %232:A, %237.sub1:Aquad
}

Can you fill out more context with the surrounding instructions + ranges? Theoretically I can reconstruct the same scenario using -stress-regalloc

So far by guessing I've only seen cases that fail to compile either way. However I'm guessing what we need is the assigned register is a strict superset of the interfering register

Does it work if you change regsOverlap to isSuperRegisterEq?

Changing

TRI->regsOverlap(VRM->getPhys(Intf->reg()), PhysReg)))) &&

into

TRI->isSuperRegisterEq(VRM->getPhys(Intf->reg()), PhysReg)))) &&

does not help since Intf is

%211 [912r,960r:0) 0@912r  weight:INF

and we have this allocation

[%211 -> $b0] A

And we're examining if using b3210 for %237:Aquad would work.

b0 overlaps with b3210 so the test with regsOverlap makes mayRecolorAllInterferences bail out.
And b3210 is a super reg of b0, so using isSuperRegisterEq also makes mayRecolorAllInterferences bail out.

If you meant to try out

TRI->isSuperRegisterEq(PhysReg, VRM->getPhys(Intf->reg()))))) &&

then yes, that helps since b0 is not a super register of b3210.

I dumped LIS and VRM when entering tryLastChanceRecoloring and cut out and simplified what I think are the important parts:

%211 [912r,960r:0) 0@912r  weight:INF
%226 [920r,960r:0) 0@920r  weight:INF
%229 [928r,960r:0) 0@928r  weight:INF
%232 [932r,960r:0) 0@932r  weight:INF
%234 [940r,960r:0) 0@940r  weight:INF
%235 [944r,960r:1)[960r,968r:0) 0@960r 1@944r  weight:INF
%237 [952r,960r:0) 0@952r  weight:INF

[...]

912B	  %211:A = loadA %stack.4
920B	  %226:A = loadA %stack.6
928B	  %229:A = loadA %stack.8
932B	  %232:A = loadA %stack.10
940B	  %234:Aquad = loadAquad %stack.11
944B	  %235:Aquad = loadAquad %stack.12
952B	  %237:Aquad = loadAquad %stack.0

960B BUNDLE [...] {
  inst1 %234:Aquad
  inst1 %235:Aquad
  undef %235.sub0:Aquad = inst2 %211:A, %237.sub0:Aquad
  undef %235.sub1:Aquad = inst2 %226:A, %237.sub1:Aquad
  undef %235.sub2:Aquad = inst2 %229:A, %237.sub0:Aquad
  undef %235.sub3:Aquad = inst2 %232:A, %237.sub1:Aquad
}

968B      storeAquad %235:Aquad, %stack.12

[...]

[%234 -> $a3210] Aquad
[%235 -> $b7654] Aquad
[%211 -> $b0_32] A
[%226 -> $b3_32] A
[%229 -> $b2_32] A
[%232 -> $a4_32] A

And to repeat: with the new regsOverlap check we get:

Try to assign: %237 [952r,960r:0) 0@952r  weight:INF to $a3210
Early abort: the interference is not recolorable.
Some interferences cannot be recolored.
Try to assign: %237 [952r,960r:0) 0@952r  weight:INF to $b3210
Early abort: the interference is not recolorable.
Some interferences cannot be recolored.
Try to assign: %237 [952r,960r:0) 0@952r  weight:INF to $b7654
Early abort: the interference is not recolorable.
Some interferences cannot be recolored.
Try to assign: %237 [952r,960r:0) 0@952r  weight:INF to $a7654
Early abort: the interference is not recolorable.
Some interferences cannot be recolored.
LLVM ERROR: ran out of registers during register allocation

but before, without it, it also really explored the

Try to assign: %237 [952r,960r:0) 0@952r  weight:INF to $b3210

path, and finally succeeded since that way it could shuffle the fragmented free A registers around to form a free Aquad.

lkail added a subscriber: lkail.Feb 22 2022, 12:59 AM
arsenm added a comment.Mar 3 2022, 6:29 PM

I'm beginning to think that mayRecolorAllInterferences has no obligation to skip based on the register class / subregisters, and something else is wrong. The full interference checks are done later. The main failure I'm observing happens when trying to restore the allocation state after last chance recoloring fails. It ends up trying to restore the assignment of a register which is already allocated. I suspect something is going wrong when this is recursively called

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 6:29 PM

I'm beginning to think that mayRecolorAllInterferences has no obligation to skip based on the register class / subregisters, and something else is wrong. The full interference checks are done later. The main failure I'm observing happens when trying to restore the allocation state after last chance recoloring fails. It ends up trying to restore the assignment of a register which is already allocated. I suspect something is going wrong when this is recursively called

Not sure if it's what you are experiencing, but tryLastChanceRecoloring is broken in a few cases where "complicated" things have happened during the "try" part of it.

If recoloring fails, it tries to restore the state, but it does not restore e.g. splitting of vregs that happened.

So e.g. if one of the evicted vregs get splitted, then the allocator may end up in a corrupted state after the physreg is restored to the splitted vreg. The vreg may then have an empty live range but still have a physreg allocated, and this may later lead to a number of different assertions and crashes.

I'm beginning to think that mayRecolorAllInterferences has no obligation to skip based on the register class / subregisters, and something else is wrong. The full interference checks are done later. The main failure I'm observing happens when trying to restore the allocation state after last chance recoloring fails. It ends up trying to restore the assignment of a register which is already allocated. I suspect something is going wrong when this is recursively called

Not sure if it's what you are experiencing, but tryLastChanceRecoloring is broken in a few cases where "complicated" things have happened during the "try" part of it.

If recoloring fails, it tries to restore the state, but it does not restore e.g. splitting of vregs that happened.

So e.g. if one of the evicted vregs get splitted, then the allocator may end up in a corrupted state after the physreg is restored to the splitted vreg. The vreg may then have an empty live range but still have a physreg allocated, and this may later lead to a number of different assertions and crashes.

I think this is exactly what's happening

I'm beginning to think that mayRecolorAllInterferences has no obligation to skip based on the register class / subregisters, and something else is wrong. The full interference checks are done later. The main failure I'm observing happens when trying to restore the allocation state after last chance recoloring fails. It ends up trying to restore the assignment of a register which is already allocated. I suspect something is going wrong when this is recursively called

Not sure if it's what you are experiencing, but tryLastChanceRecoloring is broken in a few cases where "complicated" things have happened during the "try" part of it.

If recoloring fails, it tries to restore the state, but it does not restore e.g. splitting of vregs that happened.

So e.g. if one of the evicted vregs get splitted, then the allocator may end up in a corrupted state after the physreg is restored to the splitted vreg. The vreg may then have an empty live range but still have a physreg allocated, and this may later lead to a number of different assertions and crashes.

I think this is exactly what's happening

I saw the revert in 8d66603a4. Thanks!

The complicated cases are indeed not well tested since it is very hard to hit them. I was hoping Matt's fix would avoid falling in one of them, while not preventing actual success. Looks like it wasn't the case.

So e.g. if one of the evicted vregs get splitted, then the allocator may end up in a corrupted state after the physreg is restored to the splitted vreg. The vreg may then have an empty live range but still have a physreg allocated, and this may later lead to a number of different assertions and crashes.

Is this what is happening here?
Would it be enough to drop the attempted recolored vreg when it has an empty range? (At this point, its live-range should be covered by the new split vregs)

The complicated cases are indeed not well tested since it is very hard to hit them. I was hoping Matt's fix would avoid falling in one of them, while not preventing actual success. Looks like it wasn't the case.

So e.g. if one of the evicted vregs get splitted, then the allocator may end up in a corrupted state after the physreg is restored to the splitted vreg. The vreg may then have an empty live range but still have a physreg allocated, and this may later lead to a number of different assertions and crashes.

Is this what is happening here?
Would it be enough to drop the attempted recolored vreg when it has an empty range? (At this point, its live-range should be covered by the new split vregs)

I'm still working on debugging this. I sort of see what's happening, but not really. At the point of failure, it's trying to unassign and reassign a register which was added to FixedRegisters after the initial interference check. It seems like the interference check needs to account for registers which will be added to FixedRegisters in the recursive calls. At the points those points are made, the ultimate parent register isn't in FixedRegisters for some reason.

The complicated cases are indeed not well tested since it is very hard to hit them. I was hoping Matt's fix would avoid falling in one of them, while not preventing actual success. Looks like it wasn't the case.

So e.g. if one of the evicted vregs get splitted, then the allocator may end up in a corrupted state after the physreg is restored to the splitted vreg. The vreg may then have an empty live range but still have a physreg allocated, and this may later lead to a number of different assertions and crashes.

Is this what is happening here?
Would it be enough to drop the attempted recolored vreg when it has an empty range? (At this point, its live-range should be covered by the new split vregs)

I'm still working on debugging this. I sort of see what's happening, but not really. At the point of failure, it's trying to unassign and reassign a register which was added to FixedRegisters after the initial interference check. It seems like the interference check needs to account for registers which will be added to FixedRegisters in the recursive calls. At the points those points are made, the ultimate parent register isn't in FixedRegisters for some reason.

I can't articulate why, but changing the condition in mayRecolorAllInterferences to check if the stage is RS_Done or RS_Split also seems to work

The complicated cases are indeed not well tested since it is very hard to hit them. I was hoping Matt's fix would avoid falling in one of them, while not preventing actual success. Looks like it wasn't the case.

So e.g. if one of the evicted vregs get splitted, then the allocator may end up in a corrupted state after the physreg is restored to the splitted vreg. The vreg may then have an empty live range but still have a physreg allocated, and this may later lead to a number of different assertions and crashes.

Is this what is happening here?
Would it be enough to drop the attempted recolored vreg when it has an empty range? (At this point, its live-range should be covered by the new split vregs)

I'm still working on debugging this. I sort of see what's happening, but not really. At the point of failure, it's trying to unassign and reassign a register which was added to FixedRegisters after the initial interference check. It seems like the interference check needs to account for registers which will be added to FixedRegisters in the recursive calls. At the points those points are made, the ultimate parent register isn't in FixedRegisters for some reason.

I can't articulate why, but changing the condition in mayRecolorAllInterferences to check if the stage is RS_Done or RS_Split also seems to work

I think this just happens to hide it.

The problem is when you have multiple registers being recolored, and one of them succeeds and the other fails recoloring. The one that succeeded did further recoloring which is not rolled back on the failure of the second. It leaves behind registers assigned to registers which were temporarily freed up in the parent call. This needs to track much more of the allocation state to accurately roll back the recoloring attempt

The complicated cases are indeed not well tested since it is very hard to hit them. I was hoping Matt's fix would avoid falling in one of them, while not preventing actual success. Looks like it wasn't the case.

So e.g. if one of the evicted vregs get splitted, then the allocator may end up in a corrupted state after the physreg is restored to the splitted vreg. The vreg may then have an empty live range but still have a physreg allocated, and this may later lead to a number of different assertions and crashes.

Is this what is happening here?
Would it be enough to drop the attempted recolored vreg when it has an empty range? (At this point, its live-range should be covered by the new split vregs)

I'm still working on debugging this. I sort of see what's happening, but not really. At the point of failure, it's trying to unassign and reassign a register which was added to FixedRegisters after the initial interference check. It seems like the interference check needs to account for registers which will be added to FixedRegisters in the recursive calls. At the points those points are made, the ultimate parent register isn't in FixedRegisters for some reason.

I can't articulate why, but changing the condition in mayRecolorAllInterferences to check if the stage is RS_Done or RS_Split also seems to work

I think this just happens to hide it.

The problem is when you have multiple registers being recolored, and one of them succeeds and the other fails recoloring. The one that succeeded did further recoloring which is not rolled back on the failure of the second. It leaves behind registers assigned to registers which were temporarily freed up in the parent call. This needs to track much more of the allocation state to accurately roll back the recoloring attempt

I now have another attempt which tracks a stack of successful recolorings that are rolled back on failure, rather than just the current depth's recolor candidates. However, the compile time is noticeably increased in my failing testcases (in a debug build one of them takes a full 15 seconds to complete) and it unexpectedly changes the output in a SystemZ test. I'm wondering if a better strategy would be to try avoiding recolor candidates that could conflict with the saved recoloring state for fixed registers

I'm wondering if a better strategy would be to try avoiding recolor candidates that could conflict with the saved recoloring state for fixed registers

If we go down that road, I'm pretty sure we will find cases where the recoloring was working previously and that would fail after such change.
The idea is while doing the recoloring we should have access to every color that is currently available.

I now have another attempt which tracks a stack of successful recolorings that are rolled back on failure

I haven't looked at that code for quite some time so I could totally mistaken, but that's conceptually already what is supposed to happen.

The order in which we roll back a particular "level" shouldn't matter. The stacking happens already naturally with the recursive calls.
IIRC we record the coloring of each register involved in the recoloring for each level (as materialized by the local variable in the call frame) and if it fails we put them back into this level pre-recoloring state.

The one that succeeded did further recoloring which is not rolled back on the failure of the second.

Hmm, that's weird, if one register fails within the same level, we should roll back the whole level to its previous state.

Hmm, that's weird, if one register fails within the same level, we should roll back the whole level to its previous state.

Oh I think I see what you mean!
When we successfully recolor part of a level, the recoloring that happened below that level is not rolled back.

Yeah, I could totally see that problem now. I think we've been lucky so far because on failure most levels probably have only register to recolor.

Replacement patch is D122580