This fixes an issue where values are uniform inside of a loop but
the uses of that value outside the loop can be divergent.
Change-Id: I94d3d2e30cc2a6ae8d59e92cadf6f1b6cb7e708b
Differential D60834
[AMDGPU] Uniform values being used outside loop marked non-divergent rtaylor on Apr 17 2019, 12:53 PM. Authored by
Details
Diff Detail
Event TimelineComment Actions This is a workaround. The structurizer / annotator must be correct without relying on another pass to hide situations they don't handle correctly Comment Actions We have a test case such that a value that is uniform in the loop is used outside the loop where threads might have diverged. For example: define amdgpu_ps void @_amdgpu_ps_main(<4 x i32> inreg %desc, float %divergent, <2 x i32> %ptrish) { br label %.preheader .preheader: %tmp62 = phi i32 [ %tmp105, %bb104 ], [ 0, %bb59 ] cmp and branch here bb104: %tmp105 = add nuw nsw i32 %tmp62, 1 cmp and branch here .loopexit: %load2 = tail call i32 @llvm.amdgcn.s.buffer.load.i32(<4 x i32> %desc, i32 %tmp62, i32 0) } This calls lcssa after StructurizeCFG which inserts PHI nodes into the exit block for this type of value, allowing proper DA of the value after the loop. Comment Actions We discussed going to the new DA which may track these properly. I haven't gotten to finding out if that is the case yet or not. Some of the group thought it was worthwhile to upstream this for now, I've added you to an email chain regarding this issue. Comment Actions It sort of intuitively makes sense to me that the control flow lowering would like LCSSA. However, this should not be handled by adding it directly to the pass pipeline. You can add this as a dependency, e.g. AU.addRequiredID(LCSSAID); I would also like to see the an IR->IR testcase showing LCSSA was implicitly run Comment Actions Actually, what really requires LCSSA? Is it DivergenceAnalysis or StructurizeCFG directly? Comment Actions How I understand the problem is that DA is not looking across blocks and therefore won't see that tmp62 is actually divergent in loop exit (though it is uniform in the loop). LCSSA provides a phi node for the loop exit block (where it is divergent) and allows DA to mark it divergent so that the s_buffer_load can be lowered to a buffer_load. Comment Actions OK, so it seems to me like the dependency should be on DivergenceAnalysis. if you just patch it here, the same problem will occur for the handful of other passes that optionally use it Comment Actions I think there are some misunderstandings here. None of the IR passes require LCSSA. The problem is in getting the divergence data into the SelectionDAG. Specifically, you can have, in a weird mixture of IR and SelectionDAG: loop: ... %uni = ... ; Value is uniform here ... br i1 %div, label %loop, label %next ; Divergent loop exit next: %0 = CopyFromReg N(corresponding to %uni) use %0 In this case, %0 must be labeled divergent. However, %0 does not exist at an IR level, and so the code in isSDNodeSourceOfDivergence can only query for the divergence of %uni. However, %uni itself is uniform. One way to look at the problem is that DivergenceAnalysis::isDivergent is really "isDivergentAtDefinition", and what we need is a query "isDivergentAtUse". Implementing that query isn't entirely trivial, and LCSSA is effectively an alternative way of making the right query. Comment Actions I am hitting this assert in LuxMark with this patch:
0 clang 0x0000000109a1f8bc llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 60 I'm working on reducing it Comment Actions This should really be expressed as a pass dependency, not explicitly adding the pass to the pipeline
Comment Actions Preserving StackProtector in LCSSA to avoid pass scheduling conflict
Comment Actions Added StackProtectorID and changed LCSSA to use it instead of StackProtector directly.
Comment Actions Okay, so the linker errors haven't been resolved. If issues related to this problem are burning people urgently, I think a reasonable quick fix would be to add LCSSA not as a dependency but as a codegen pass. It'd be great if we could avoid that, but as hacks go, that still seems cleaner to me than some of the alternatives. Comment Actions See D62802 and http://lists.llvm.org/pipermail/llvm-dev/2019-June/132751.html for the linker dependency issue. Comment Actions This has been copied over and extended on in https://reviews.llvm.org/D62802 so I'm abandoning this revision. |
You should remove the explicit pass since it's a dependency now