This is an archive of the discontinued LLVM Phabricator instance.

[InlineSpiller] Fix a crash due to lack of forward progress from remat
ClosedPublic

Authored by reames on Dec 11 2017, 4:14 PM.

Details

Summary

This is a very ugly fix and I'm hoping someone has a better idea on how to fix this. What's going on is we've got an instruction with more vreg uses than there are physical registers, and rematerialization appears to assume that there's always at least one physical register available in the user instruction. At the moment, the only instructions I know of which have this problem are PATCHPOINT, STATEPOINT, and STACKMAP, but in theory, any instruction which expects to work on a lot of stack slots at once could have this problem.

(The zexts in the test case are simply a stand-in for any rematable operation which does not fold into the use.)

The only other fix I see here is to separate the spiller and remat logic entirely and expose remat as a distinct step within the register allocator. There are some arguments in favour of this - in particular, right now the splitter is much less good at remat then the spiller is - but it's a large restructuring for what is arguably a cornercase.

Diff Detail

Repository
rL LLVM

Event Timeline

reames created this revision.Dec 11 2017, 4:14 PM

Hi Philip,

I agree that a refactoring of how remat is done would be a good thing, but also only doing it for this corner case is overkill.

The current approach makes sense, but I believe we should use pressure sets or something along those lines, because the current logic won't work generally speaking. See the inline comment.

Cheers,
-Quentin

lib/CodeGen/InlineSpiller.cpp
588 ↗(On Diff #126466)

Counting by RC is not going to give you the result you want.
You can imagine some instruction with different RCs that overlap, e.g., GPRsp, GPR.
With the current counting your going to count them in different bucket and miss the overlap.

I believe the pressure sets (look in TargetRegisterInfo) would do what you want. Double check because it's been a while since I've played with it.

reames updated this revision to Diff 127920.Dec 21 2017, 11:18 AM

Second attempt, this time accounting for sub-registers

reames added inline comments.Dec 21 2017, 11:21 AM
lib/CodeGen/InlineSpiller.cpp
588 ↗(On Diff #126466)

I looked at pressure sets and they didn't seem to have an obvious application here, but they did lead me to the notion of RegUnits. I think the updated code addresses the aliasing concern you raise, can you confirm? I'm figuring this out as I go along and am not familiar with this area of the code.

qcolombet requested changes to this revision.Jan 5 2018, 10:07 AM
qcolombet added inline comments.
lib/CodeGen/InlineSpiller.cpp
530 ↗(On Diff #127920)

Just to comment on the TODO, this code is actually not conservative as it assumes all the registers of the largest legal super class are a suitable assignment for the current register.
This is generally not true, since this won't take into account the constraints of the encoding of MI.

Anyhow, sketching a potential solution in the comment below.

543 ↗(On Diff #127920)

This is kind of a random check because:

  1. Not all operands in MI are registers
  2. Not all operands in MI have the same constraints and thus AllocationOrder may not be relevant for all of them
575 ↗(On Diff #127920)

I believe the solution should look something like this:
const TargetRegisterClass *RelaxedRC = getLargestLegalSuperClassConstraintedOnMIUsage(MI, VReg) // look at getNumAllocatableRegsForConstraints in RegAllocGreedy

if (!RelaxedRC)
return false; // Could be a fatal error because if we end up in this situation that means the allocation problem is not feasible

LiveRegUnits Used(TRI)
Used.accumulate(MI)

// Walk the registers in RelaxedRC and check if it exists one which has all its regunits not available
for (MCPhysReg PossibleReg : RelaxedRC) {

if (MRI.isReserved(PossibleReg))
  continue;
bool IsAvailable = true;
for (MCRegUnitIterator Units(PossibleReg, TRI); Units.isValid(); ++Units)
   if (!Used.available(Units)) {
     IsAvailable = false;
    break;
  }
if (IsAvailable)
  return true;

}
return false;

This revision now requires changes to proceed.Jan 5 2018, 10:07 AM
reames updated this revision to Diff 151057.Jun 12 2018, 3:57 PM

Quentin, I read over your suggested approach, but honestly, I didn't entirely follow. Rather than going through another round with an approach that I clearly don't understand, I'd like to go with a quick and dirty hack for the moment if you don't mind. Are you okay with us solving this just for STATEPOINTs at the moment by essentially just disabling remats for STATEPOINT uses?

Are you okay with us solving this just for STATEPOINTs at the moment by essentially just disabling remats for STATEPOINT uses?

Works for me. Would be good to make sure every user of STATEPOINT agree with that, but I trust you would know better how to coordinate with :).

This revision was not accepted when it landed; it landed in state Needs Review.Jun 19 2018, 2:24 PM
This revision was automatically updated to reflect the committed changes.

I went ahead and submitted the workaround, but I want to record one other idea I tried. It didn't work out, but I'm not sure if that's due to something fundamental, or if I just had a non-obvious bug.

I explored introducing another step in RegAllocGreedy's LiveRangeStage after RS_Spill called RS_SpillWithoutRemat. The idea was that spilling transitioned the new vregs created not to RS_Done, but to the new state which allowed them to be spilled again if needed. By avoiding remat, we should avoid infinite retry loops. If we don't remat, then every reload use should be foldable into the STATEPOINT . Given the existing remats might have fragmented live ranges, we might end up with awful spill, reload, spill, fold patterns, but in theory it should always work.

Another framing would be to split out RS_Remat as a distinct phase before either splitting or spilling. At the moment, we have two different sets of remat logic (in splitting and spilling respectively), so I didn't want to pursue that approach since there might be lots of perturbation of the output. I also wasn't quite sure how the queue order would impact code quality if we rematted, and then defered spilling until other vregs had been processed.