This is an archive of the discontinued LLVM Phabricator instance.

Regenerate removed implicit defs in BranchFolder where necessary
AbandonedPublic

Authored by kparzysz on Oct 7 2016, 8:15 AM.

Details

Reviewers
qcolombet
MatzeB
Summary

Branch folder removes implicit defs if they are the only non-branching instructions in a block, and the branches do not use the defined registers. The problem is that in some cases these implicit defs are required for the liveness information to be correct.
After branch optimizations, regenerate those implicit defs that are still necessary.

Diff Detail

Repository
rL LLVM

Event Timeline

kparzysz updated this revision to Diff 73934.Oct 7 2016, 8:15 AM
kparzysz retitled this revision from to Regenerate removed implicit defs in BranchFolder where necessary.
kparzysz updated this object.
kparzysz added reviewers: MatzeB, qcolombet.
kparzysz set the repository for this revision to rL LLVM.
kparzysz added a subscriber: llvm-commits.
MatzeB edited edge metadata.Oct 7 2016, 5:46 PM

Hmm BranchFolder::OptimizeImpDefsBlock() has some odd logic indeed, I wonder why nobody found problems with it earlier.

The patch is slightly odd as well. BranchFolder::RegenerateImpDefsInBlock() should be able to restore any missing IMPLICIT_DEFS, so I am not sure why that global set of registers is kept. This also makes me feel uneasy, the approach does not target blocks where IMPLICIT_DEFS were removed (or rather their new predecessors in case they got merged), but instead attempts to fix up any block. (The entry in the ImpDefRegs set may come from a completely different basic block). Nonetheless given that this appears to be broken right now I would not be opposed to the patch when some of the points below are addressed:

lib/CodeGen/BranchFolding.cpp
160

What was wrong with SmallSet? SmallSet is usually the better choice over std::set because it provides inline storage and I would consider a small number of elements likely here. Also std::set is ordered and therefore usually not implemented with more efficient hashing.

167–169

Maybe only put "Reg" into the set. And instead does some TRI.regsOerlap() test in the 2nd loop. This may be slightly more efficient anyway as I would expect the tupical unconditional branch has no inputs so there is nothing to test. But the real benefit would be to not have all those subregs in the ImpDefRegs set later.

200–202

This should probably not be restricted to physregs. The pass description claims that the pass should also work with virtual regs. (NVPTX and WebAsm are still using virtual registers late in the codegen pipeline).

215

Using stepForward is discouraged. It will result in overly conservative liveness in case of missing kill flags. If possible go for stepBackward().

Proposal for a different strategy:
Collect live out regs, make a copy of that set
check each instruction backwards:

check operands:
    on reg define: remove reg from the copy set
    on reg kill: check if reg is still in the copy set, if so append IMPLICIT_DEF at the end
simulate liveness on the normal set

when reaching the beginning compare liveregs with liveins, for missing regs add IMPLICIT_DEF instruction.

275

SmallSet?

302

All of this can be skipped if ImpDefRegs.empty() or MachineRegisterInfo::tracksLiveness() returns false.

Hmm BranchFolder::OptimizeImpDefsBlock() has some odd logic indeed, I wonder why nobody found problems with it earlier.

The patch is slightly odd as well. BranchFolder::RegenerateImpDefsInBlock() should be able to restore any missing IMPLICIT_DEFS, so I am not sure why that global set of registers is kept.

It didn't start that way, but in some Thumb testcase it produced an implicit def of LR, which caused some CHECK-NEXT line to fail (implicit defs show up in comments in the final assembly). I thought LR would be reserved on ARM, but it's not (apparently deliberately) and I didn't want to touch anything there. That is really the only motivation for this global set.

This also makes me feel uneasy, the approach does not target blocks where IMPLICIT_DEFS were removed (or rather their new predecessors in case they got merged), but instead attempts to fix up any block.

The reason is that the anticipated optimizations could do just about anything with the rest of the code: merge block, split blocks, duplicate, etc. I thought that keeping track of what happened to what block would be too convoluted and too error-prone.

lib/CodeGen/BranchFolding.cpp
160

SmallSet does not provide iterators: that was the reason I used set.

200–202

I use LivePhysRegs to establish available registers. It does not work with virtual registers.

MatzeB added a comment.Oct 7 2016, 6:46 PM

Hmm BranchFolder::OptimizeImpDefsBlock() has some odd logic indeed, I wonder why nobody found problems with it earlier.

The patch is slightly odd as well. BranchFolder::RegenerateImpDefsInBlock() should be able to restore any missing IMPLICIT_DEFS, so I am not sure why that global set of registers is kept.

It didn't start that way, but in some Thumb testcase it produced an implicit def of LR, which caused some CHECK-NEXT line to fail (implicit defs show up in comments in the final assembly). I thought LR would be reserved on ARM, but it's not (apparently deliberately) and I didn't want to touch anything there. That is really the only motivation for this global set.

A register may be non-allocatable but still not reserved, we do track liveness for those registers.
This seems like an example for the reservations I had: Was the pass cleaning up something that was broken before anyway which I maybe should have left alone? Was the IMPLICIT_DEF in that case legit so we should rather adjust the CHECKs. Having a global set seems like a bad idea to restrict this maybe a legit IMPLICIT_DEF of LR was removed in another block...

This also makes me feel uneasy, the approach does not target blocks where IMPLICIT_DEFS were removed (or rather their new predecessors in case they got merged), but instead attempts to fix up any block.

The reason is that the anticipated optimizations could do just about anything with the rest of the code: merge block, split blocks, duplicate, etc. I thought that keeping track of what happened to what block would be too convoluted and too error-prone.

I understand but it doesn't convince me that the course taken is a good one :)

lib/CodeGen/BranchFolding.cpp
160

Bummer. LLVM also features DensetSet which I would still consider a better choice than std::set here where we don't need a stable order.

200–202

I just realized that block live in lists only contain PhysRegs right now. NVPTX and WebAsm both seem to have liveness tracking disabled past regalloc. So the physreg check can be an assert() instead.

There is a better plan: D25478.

I want to abandon this patch. I don't think there is a good way of making this work without some form of liveness analysis.

This comment was removed by MatzeB.
kparzysz abandoned this revision.Oct 12 2016, 1:04 PM