This is an archive of the discontinued LLVM Phabricator instance.

Don't delete empty preheaders in CodeGenPrepare if it would create a critical edge
ClosedPublic

Authored by tjablin on Feb 8 2016, 6:49 AM.

Details

Summary

Presently, CodeGenPrepare deletes all nearly empty (only phi and branch) basic blocks. This pass can delete loop preheaders which frequently creates critical edges. A preheader can be a convenient place to spill registers to the stack. If the entrance to a loop body is a critical edge, then spills may occur in the loop body rather than immediately before it. This patch protects loop preheaders from deletion in CodeGenPrepare even if they are nearly empty.

Since the patch alters the CFG, it affects a large number of test cases. In most cases, the changes are merely cosmetic (basic blocks have different names or instruction orders change slightly). I am somewhat concerned about the test/CodeGen/Mips/brdelayslot.ll test case. If the loop preheader is not deleted, then the MIPS backend does not take advantage of a branch delay slot. Consequently, I would like some close review by a MIPS expert.

The patch also partially subsumes D16893 from George Burgess IV. George correctly notes that CodeGenPrepare does not actually preserve the dominator tree. I think the dominator tree was usually not valid when CodeGenPrepare ran, but I am using LoopInfo to mark preheaders, so the dominator tree is now always valid before CodeGenPrepare.

Diff Detail

Event Timeline

tjablin updated this revision to Diff 47188.Feb 8 2016, 6:49 AM
tjablin retitled this revision from to Don't delete empty preheaders in CodeGenPrepare if it would create a critical edge.
tjablin updated this object.
tjablin added a subscriber: llvm-commits.

Sorry, I'm not familiar enough with LLVM to feel comfortable stamping this patch. That said, it looks sane to me, with two nits.

Thanks for doing this! :)

lib/CodeGen/CodeGenPrepare.cpp
372

Why can't we just do for (Loop *L : *LI)?

413

typo: "deleeted"

tjablin updated this revision to Diff 47242.Feb 8 2016, 1:10 PM
tjablin edited edge metadata.

Accidentally omitted code for iterating over inner loops. Fix spelling error.

dsanders edited edge metadata.Feb 9 2016, 7:55 AM

I am somewhat concerned about the test/CodeGen/Mips/brdelayslot.ll test case. If the loop preheader is not deleted, then the MIPS backend does not take advantage of a branch delay slot. Consequently, I would like some close review by a MIPS expert.

I see what's happening. Somehow the '%V0<def> = ADDiu %ZERO, 0' has appeared in both %entry and %for.body.preheader. It's then refusing to hoist the one in %for.body.preheader into %entry because this would move the instruction between the one in %entry and the live-in for %for.body.preheader (which contains %V0). It thinks this will change the value even though they both materialize zero into %V0 using the same instruction.

That's a weakness in our algorithm but I wonder why this patch duplicates the instruction. The one in %for.body.preheader presumably comes from the '%s.06 = phi i32 [ %add, %for.body ], [ 0, %for.body.preheader ]' in %for.body, but I can't explain the one in %entry and the delay slot filler would work without that one.

For reference, here's the IR I'm talking about above:

BB#0: derived from LLVM BB %entry
    Live Ins: %A0 %A1
        %vreg8<def> = COPY %A1; GPR32:%vreg8
        %vreg7<def> = COPY %A0; GPR32:%vreg7
        %vreg9<def> = ADDiu %ZERO, 0; GPR32:%vreg9
        BLEZ %vreg8, <BB#3>, %AT<imp-def,dead>; GPR32:%vreg8
        B <BB#1>, %AT<imp-def,dead>
    Successors according to CFG: BB#1(0x50000000 / 0x80000000 = 62.50%) BB#3(0x30000000 / 0x80000000 = 37.50%)

BB#1: derived from LLVM BB %for.body.preheader
    Predecessors according to CFG: BB#0
        %vreg10<def> = ADDiu %ZERO, 0; GPR32:%vreg10
    Successors according to CFG: BB#2(?%)

BB#2: derived from LLVM BB %for.body
    Predecessors according to CFG: BB#1 BB#2
        %vreg0<def> = PHI %vreg7, <BB#1>, %vreg5, <BB#2>; GPR32:%vreg0,%vreg7,%vreg5
        %vreg1<def> = PHI %vreg8, <BB#1>, %vreg4, <BB#2>; GPR32:%vreg1,%vreg8,%vreg4
        %vreg2<def> = PHI %vreg10, <BB#1>, %vreg3, <BB#2>; GPR32:%vreg2,%vreg10,%vreg3
        %vreg11<def> = LW %vreg0, 0; mem:LD4[%lsr.iv1] GPR32:%vreg11,%vreg0
        %vreg3<def> = ADDu %vreg11<kill>, %vreg2; GPR32:%vreg3,%vreg11,%vreg2
        %vreg5<def> = ADDiu %vreg0, 4; GPR32:%vreg5,%vreg0
        %vreg4<def> = ADDiu %vreg1, -1; GPR32:%vreg4,%vreg1
        BNE %vreg4, %ZERO, <BB#2>, %AT<imp-def,dead>; GPR32:%vreg4
        B <BB#3>, %AT<imp-def,dead>
    Successors according to CFG: BB#3(0x04000000 / 0x80000000 = 3.12%) BB#2(0x7c000000 / 0x80000000 = 96.88%)

BB#3: derived from LLVM BB %for.end
    Predecessors according to CFG: BB#0 BB#2
        %vreg6<def> = PHI %vreg9, <BB#0>, %vreg3, <BB#2>; GPR32:%vreg6,%vreg9,%vreg3
        %V0<def> = COPY %vreg6; GPR32:%vreg6
        RetRA %V0<imp-use>
test/CodeGen/Mips/brdelayslot.ll
2–6

On each of these, the preheader contains a duplicate of the 'addiu $2, $zero, 0' at the end of the previous bb.

kbarton accepted this revision.Mar 30 2016, 10:29 AM
kbarton edited edge metadata.

LGTM.
Note that the only test case I looked at in detail was dont-remove-empty-preheader.ll

lib/CodeGen/CodeGenPrepare.cpp
372

This wasn't immediately obvious to me either.
This is needed because of nested loops. The begin() and end() iterators for LoopInfo just contain the topmost loops for a given depth (nest level). For each loop in the list you need to:

  1. Check its preheader;
  2. Add all of the loops contained within it. Line 374 handles adding the topmost loops within the current loop body.
This revision is now accepted and ready to land.Mar 30 2016, 10:29 AM
cycheng closed this revision.Apr 5 2016, 7:15 AM
cycheng edited edge metadata.

On behalf of Tom.
Committed revision: r265397