Add peephole optimization to remove redundant MOV if:
- the MOV is identical to a MOV in the immediate predecessor of MBB and
- no instruction between them modifies the destination register
Change-Id: Ibb187e3219ef641b7681e4123c7ca3fe8e1d14e6
|  Differential  D63709  
[AMDGPU] Add peephole to optimize MOV Authored by piotr on Jun 24 2019, 5:27 AM. 
Details 
Diff Detail 
 Event TimelineComment Actions This doesn't seem like the right place to handle this. This doesn't really fit with SIShrinkInstructions. I also wouldn't expect you need to do anything special to handle this case. Why doesn't MachineCSE catch this? Can you add an IR example testcase? Comment Actions Patch does not take control flow divergence into account. 
 Comment Actions This opt opportunity presents itself after the register coalescer, where inline constants are moved to the same registers (attaching IR example testcase imminently). MachineCSE is run too early to spot this and the last occurence of si-shrink-instructions is run at the right time. I think, for similar reasons some other peepholes were also placed in this pass, even though they are not about 32-bit encoding. 
 
 Comment Actions I'd like to take a look a more reduced testcase. My intuition is something else is going wrong that's not eliminating this 
 Comment Actions I do suspect the handling of the EXEC physreg use in MachineCSE. I don't see any of the typical sources of late visible constants in this testcase? 
 
 
 Comment Actions Other targets seem to not have this problem with a slightly generalized version, so I would look into how this is cleaned up there Comment Actions It seems we're missing a simplifycfg run somewhere, so maybe we're thinking of this on the wrong level entirely.. If I run simplify cfg on any of the testcase variants, this problem disappears Comment Actions Other targets seem to run SimiplifyCFG after AtomicExpand, which we are missing. Even with that disabled and the phi survives to machineinstrs, aarch64 and hexagon both avoid this Comment Actions Well spotted, I will try to look at the issue from the perspective of missing simplifycfg. Comment Actions MachineCSE should be taking care of this, but it for some reason concludes it isn't profitable: 
 Comment Actions I think things are going from from this heuristic: // Heuristics #3: If the common subexpression is used by PHIs, do not reuse
// it unless the defined value is already used in the BB of the new use.
bool HasPHI = false;
for (MachineInstr &UseMI : MRI->use_nodbg_instructions(CSReg)) {
  HasPHI |= UseMI.isPHI();
  if (UseMI.getParent() == MI->getParent())
    return true;
}The second phi use is in a flow block with only the phi, so the use is in the fall through successor. Maybe it could be relaxed to look through trivial successors, at least for blocks with only phis? Comment Actions Nice tracking down. I will abandon this review, and submit another one where I will extend the MachineCSE pass as you suggested. | ||||||||||||||||||||||||||||||||||||||||||||||
Even is block has only one predecessor that does not mean this is the reaching def of the physreg. The predecessor may (and likely will) have a different exec mask. It might work with sclaral, but not with vector moves.