I was experimenting with the Loop Unroller, and found a benchmark that regressed for no obvious reason as relating to the unrolling of loops. However, I found in a loop with four outer loops, an extra unneeded copy of a register.
It was clear that the COPY was not needed, however not so clear how it should be handled. Comparing the two compilations, the regcoalescer results were identical. It was the greedy regalloc that behaved a bit differently (there were more live intervals in the unrolled loop), for one reason or other.
I looked at MCP, and found that it was missing this very simple case. I am not sure if MCP should really be a simple pass and instead the regalloc should be fixed, but I guessed that MCP has some purpose of cleaning up after regalloc, and experimented with extending it to handle my regression. Here is my patch:
- If a register use is defined by a COPY, and the COPY source reg could be used directly, use it instead if it seems that the COPY may be removable then.
- Whenever an instruction defines (clobbers) any register, remove any COPY that is tracked in MaybeDeadCopies, just like it was done for RegMask operands.
This was needed to handle:
%R4D<def> = LGR %R12D
%R7D<def> = LGR %R9D<kill>
%R1D<def> = SLLG %R14D, %noreg, 2
%R0H<def> = LFH %R4D<kill>, 0, %R1D<kill>; mem:LD4[%arrayidx35.i.i](tbaa=!6)
%R1D<def> = LGR %R5D
%R4D<def> = LGR %R3D
The COPY of R12D to R4D is unnecessary. R4D could be replaced with R12D in the LFH instruction (first point above). The COPY itself can be removed when the last instruction redefines R4D (point two above).
Why is this pass so simple? It is not trusting live-in lists, and not recomputing liveness -- is this something "todo", or? My first thought was that liveness could be checked with a reaching phys defs algorithm (see http://reviews.llvm.org/D17257), and basically the pass could be doing a bit more work then. After working with this benchmark, it became clear that even a single COPY could make a noticable difference (SPEC/bzip2).
Greatful for any feedback on this!
This looks strange. In theory we can see any number of additional implicit operands on a COPY. I assume we were just to not hit any issues because in todays practice we only add implicit defs and uses of the superregister when a subregister was used (see code in VirtRegRewriter).
Do you actually need this bugfix as part of this patch? As far as I can see COPY instructions are not renamed here so we should not run into any new problems here.
(It would still be good to upstream this fix in a separate commit through and use a loop so we also catch the case with 4 operands when we had a subregister def and use at the COPY).