This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix issues introduced by aggressive block placement
AbandonedPublic

Authored by cycheng on May 10 2016, 2:04 AM.

Details

Summary

Patch D20017 aggressively choosing the best loop top in a loop, this introduces 2 issues for AMDGPU backend:

  1. Crash issue: it breaks assumption of basic block order in shouldSkip(), e.g.
   bb2<--+   bb2 (From)      Block      Flow (To)
   /   \ |   bb6        => Placement => bb2 (From)
bb6 -> Flow  Flow (To)                  bb6

In original code, To MBB is assumed after From MBB, so it gets crash when this is not true.

  1. Unnecessary branch in fall through path, e.g.
   bb2<--+   Flow (Latch)  
   /   \ |   bb2  (Header) 
bb6 -> Flow  bb6
  
Flow: 
    s_cbranch_execnz bb2
    s_branch end
bb2:
    s_cbranch_execz Flow
bb6:
    s_branch Flow
end:

Flow can fall through bb2, so we can replace two branches into one conditional branch.

Diff Detail

Event Timeline

cycheng updated this revision to Diff 56678.May 10 2016, 2:04 AM
cycheng retitled this revision from to [AMDGPU] Fix issues introduced by aggressive block placement.
cycheng updated this object.

Would implementing AnalyzeBranch etc. help or are those not supposed to be needed for ocrrectness?

I always thought this pass didn't do anything for AMDGPU, since AnalyzeBranch wasn't implemented.

cycheng abandoned this revision.May 18 2016, 5:19 PM

Because D20017 has been abandoned, so abandon this patch, too.
We might need this patch again if "-force-precise-rotation-cost" turn on by default, and do loop rotation on this pattern:

          entry               
            |                 
------> loop.header (body)    
|97%    /       \             
|      /50%      \50%         
--- latch <--- if.then        
       |
       |3%
   loop.end

Thanks for all reviewers!