Branch folder removes implicit defs if they are the only non-branching instructions in a block, and the branches do not use the defined registers. The problem is that in some cases these implicit defs are required for the liveness information to be correct.
After branch optimizations, regenerate those implicit defs that are still necessary.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Hmm BranchFolder::OptimizeImpDefsBlock() has some odd logic indeed, I wonder why nobody found problems with it earlier.
The patch is slightly odd as well. BranchFolder::RegenerateImpDefsInBlock() should be able to restore any missing IMPLICIT_DEFS, so I am not sure why that global set of registers is kept. This also makes me feel uneasy, the approach does not target blocks where IMPLICIT_DEFS were removed (or rather their new predecessors in case they got merged), but instead attempts to fix up any block. (The entry in the ImpDefRegs set may come from a completely different basic block). Nonetheless given that this appears to be broken right now I would not be opposed to the patch when some of the points below are addressed:
lib/CodeGen/BranchFolding.cpp | ||
---|---|---|
160 | What was wrong with SmallSet? SmallSet is usually the better choice over std::set because it provides inline storage and I would consider a small number of elements likely here. Also std::set is ordered and therefore usually not implemented with more efficient hashing. | |
167–169 | Maybe only put "Reg" into the set. And instead does some TRI.regsOerlap() test in the 2nd loop. This may be slightly more efficient anyway as I would expect the tupical unconditional branch has no inputs so there is nothing to test. But the real benefit would be to not have all those subregs in the ImpDefRegs set later. | |
200–202 | This should probably not be restricted to physregs. The pass description claims that the pass should also work with virtual regs. (NVPTX and WebAsm are still using virtual registers late in the codegen pipeline). | |
215 | Using stepForward is discouraged. It will result in overly conservative liveness in case of missing kill flags. If possible go for stepBackward(). Proposal for a different strategy: check operands: on reg define: remove reg from the copy set on reg kill: check if reg is still in the copy set, if so append IMPLICIT_DEF at the end simulate liveness on the normal set when reaching the beginning compare liveregs with liveins, for missing regs add IMPLICIT_DEF instruction. | |
275 | SmallSet? | |
302 | All of this can be skipped if ImpDefRegs.empty() or MachineRegisterInfo::tracksLiveness() returns false. |
It didn't start that way, but in some Thumb testcase it produced an implicit def of LR, which caused some CHECK-NEXT line to fail (implicit defs show up in comments in the final assembly). I thought LR would be reserved on ARM, but it's not (apparently deliberately) and I didn't want to touch anything there. That is really the only motivation for this global set.
This also makes me feel uneasy, the approach does not target blocks where IMPLICIT_DEFS were removed (or rather their new predecessors in case they got merged), but instead attempts to fix up any block.
The reason is that the anticipated optimizations could do just about anything with the rest of the code: merge block, split blocks, duplicate, etc. I thought that keeping track of what happened to what block would be too convoluted and too error-prone.
lib/CodeGen/BranchFolding.cpp | ||
---|---|---|
160 | SmallSet does not provide iterators: that was the reason I used set. | |
200–202 | I use LivePhysRegs to establish available registers. It does not work with virtual registers. |
A register may be non-allocatable but still not reserved, we do track liveness for those registers.
This seems like an example for the reservations I had: Was the pass cleaning up something that was broken before anyway which I maybe should have left alone? Was the IMPLICIT_DEF in that case legit so we should rather adjust the CHECKs. Having a global set seems like a bad idea to restrict this maybe a legit IMPLICIT_DEF of LR was removed in another block...
This also makes me feel uneasy, the approach does not target blocks where IMPLICIT_DEFS were removed (or rather their new predecessors in case they got merged), but instead attempts to fix up any block.
The reason is that the anticipated optimizations could do just about anything with the rest of the code: merge block, split blocks, duplicate, etc. I thought that keeping track of what happened to what block would be too convoluted and too error-prone.
I understand but it doesn't convince me that the course taken is a good one :)
lib/CodeGen/BranchFolding.cpp | ||
---|---|---|
160 | Bummer. LLVM also features DensetSet which I would still consider a better choice than std::set here where we don't need a stable order. | |
200–202 | I just realized that block live in lists only contain PhysRegs right now. NVPTX and WebAsm both seem to have liveness tracking disabled past regalloc. So the physreg check can be an assert() instead. |
I want to abandon this patch. I don't think there is a good way of making this work without some form of liveness analysis.
What was wrong with SmallSet? SmallSet is usually the better choice over std::set because it provides inline storage and I would consider a small number of elements likely here. Also std::set is ordered and therefore usually not implemented with more efficient hashing.