This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] SILowerControlFlow::removeMBBifRedundant should not try to change MBB layout if it can fallthrough
AcceptedPublic

Authored by alex-t on Oct 14 2020, 8:39 AM.

Details

Summary

removeMBBifRedundant normally tries to keep predecessors fallthrough when removing redundant MBB.

It has to change MBBs layout to keep the new successor to immediately follow the predecessor of removed MBB.
It only may be allowed in case the new successor itself has no successors to which it fall through.

Diff Detail

Event Timeline

alex-t created this revision.Oct 14 2020, 8:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 14 2020, 8:39 AM
alex-t requested review of this revision.Oct 14 2020, 8:39 AM

Please address couple clang-tidy comments.

llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
689

It can be a regular for loop.

701

Add a blank like after this.

llvm/test/CodeGen/AMDGPU/collapse-endcf.mir
643 ↗(On Diff #298157)

This sentence clearly misses some punctuation.

alex-t updated this revision to Diff 298194.Oct 14 2020, 11:08 AM

Changed according the reviewer request.

alex-t marked 3 inline comments as done.Oct 14 2020, 11:09 AM
This revision is now accepted and ready to land.Oct 14 2020, 11:12 AM
alex-t updated this revision to Diff 298339.Oct 15 2020, 3:34 AM

minor bugfix

foad added a subscriber: foad.Oct 15 2020, 5:11 AM
foad added inline comments.
llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
685–686

You don't need to iterate to find the layout successor, just use std::next or similar.

700

Remove this and just return early if it's not redundant?

704–705

Why do we need the SmallVector? Why not just iterate over MBB.predecessors()?

alex-t added inline comments.Oct 15 2020, 1:24 PM
llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
685–686

This is not to find the layout successor. This check if one of the successors is also layout succesor.
Nevertheless, you a right - the loop is not necessary.

700

Thanks for good suggestion. I will address NFC in separate commit soon.
This one is already tested and need to be submitted asap.

704–705

P->ReplaceUsesOfBlockWith(&MBB, Succ) erases P from MBB.predecessors, i.e. calls std::vector.erase(P)
Since it is inside the ReplaceUsesOfBlockWith call I cannot decrement iterator afeter passing it in but before erase execution ("erase(P--)") to keep it valid. So I decided that using separate vector is easier and does not bring too much overhead - it is usually small.

This broke 14 GL_NV_shader_atomic_int64 piglit tests (on Navi 14), e.g. tests/spec/nv_shader_atomic_int64/execution/ssbo-atomicAdd-int.shader_test:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  llvm::PointerIntPair<llvm::ilist_node_base<true>*, 1u, unsigned int, llvm::PointerLikeTypeTraits<llvm::ilist_node_base<true>*>, llvm::PointerIntPairInfo<llvm::ilist_node_base<true>*, 1u, llvm::PointerLikeTypeTraits<llvm::ilist_node_base<true>*> > >::getPointer (this=0x0) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/PointerIntPair.h:59
59	  PointerTy getPointer() const { return Info::getPointer(Value); }
[Current thread is 1 (Thread 0x7f590effd700 (LWP 825448))]
(gdb) bt
#0  llvm::PointerIntPair<llvm::ilist_node_base<true>*, 1u, unsigned int, llvm::PointerLikeTypeTraits<llvm::ilist_node_base<true>*>, llvm::PointerIntPairInfo<llvm::ilist_node_base<true>*, 1u, llvm::PointerLikeTypeTraits<llvm::ilist_node_base<true>*> > >::getPointer (this=0x0) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/PointerIntPair.h:59
#1  llvm::ilist_node_base<true>::getPrev (this=0x0) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist_node_base.h:42
#2  llvm::ilist_base<true>::transferBeforeImpl (Next=..., First=..., Last=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist_base.h:69
#3  llvm::ilist_base<true>::transferBefore<llvm::ilist_node_impl<llvm::ilist_detail::node_options<llvm::MachineBasicBlock, true, false, void> > > (Next=..., First=..., Last=...)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist_base.h:86
#4  llvm::simple_ilist<llvm::MachineBasicBlock>::splice (this=<optimized out>, I=..., First=..., Last=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/simple_ilist.h:249
#5  llvm::iplist_impl<llvm::simple_ilist<llvm::MachineBasicBlock>, llvm::ilist_traits<llvm::MachineBasicBlock> >::transfer (this=<optimized out>, position=..., L2=..., first=..., last=...)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist.h:293
#6  llvm::iplist_impl<llvm::simple_ilist<llvm::MachineBasicBlock>, llvm::ilist_traits<llvm::MachineBasicBlock> >::splice (this=<optimized out>, where=..., L2=..., first=...)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist.h:336
#7  llvm::iplist_impl<llvm::simple_ilist<llvm::MachineBasicBlock>, llvm::ilist_traits<llvm::MachineBasicBlock> >::splice (this=<optimized out>, where=..., L2=..., N=0x7f590014b8a8)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist.h:345
#8  llvm::MachineFunction::splice (this=<optimized out>, InsertPt=..., MBB=0x7f590014b8a8) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/CodeGen/MachineFunction.h:754
#9  (anonymous namespace)::SILowerControlFlow::removeMBBifRedundant (this=0x7f5900034fe0, MBB=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp:731
#10 (anonymous namespace)::SILowerControlFlow::optimizeEndCf (this=0x7f5900034fe0) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp:627
#11 (anonymous namespace)::SILowerControlFlow::runOnMachineFunction (this=<optimized out>, MF=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp:822
#12 0x00007f591856b8dc in llvm::MachineFunctionPass::runOnFunction (this=0x7f5900034fe0, F=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:73
#13 0x00007f59183357f6 in llvm::FPPassManager::runOnFunction (this=<optimized out>, F=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1519
#14 0x00007f591945ac24 in (anonymous namespace)::CGPassManager::RunPassOnSCC (this=<optimized out>, P=0x7f5900039220, CurSCC=..., CG=..., CallGraphUpToDate=<optimized out>, DevirtualizedCall=<optimized out>)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:178
#15 (anonymous namespace)::CGPassManager::RunAllPassesOnSCC (this=<optimized out>, CurSCC=..., CG=..., DevirtualizedCall=<optimized out>) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:476
#16 (anonymous namespace)::CGPassManager::runOnModule (this=<optimized out>, M=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:541
#17 0x00007f5918335f26 in (anonymous namespace)::MPPassManager::runOnModule (this=<optimized out>, M=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1634
#18 llvm::legacy::PassManagerImpl::run (this=0x7f5900013980, M=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:615
#19 0x00007f591833be8e in llvm::legacy::PassManager::run (this=this@entry=0x7f5900013968, M=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1761
#20 0x00007f591b57979f in ac_compile_module_to_elf (p=p@entry=0x7f5900013910, module=<optimized out>, pelf_buffer=pelf_buffer@entry=0x5629c6cc29e0, pelf_size=pelf_size@entry=0x5629c6cc29e8)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/IR/Module.h:906
#21 0x00007f591b4c1844 in si_compile_llvm (sscreen=sscreen@entry=0x5629c662a400, binary=binary@entry=0x5629c6cc29e0, conf=conf@entry=0x5629c6cc29f8, compiler=compiler@entry=0x5629c662acb0, ac=ac@entry=0x7f590effb500, 
    debug=debug@entry=0x5629c6cc2360, stage=MESA_SHADER_COMPUTE, name=0x7f591a7ccc24 "Compute Shader", less_optimized=false) at ../src/gallium/drivers/radeonsi/si_shader_llvm.c:104
#22 0x00007f591b4b8aa1 in si_llvm_compile_shader (sscreen=sscreen@entry=0x5629c662a400, compiler=compiler@entry=0x5629c662acb0, shader=shader@entry=0x5629c6cc2920, debug=debug@entry=0x5629c6cc2360, nir=<optimized out>, 
    nir@entry=0x5629c6ce2040, free_nir=<optimized out>) at ../src/gallium/drivers/radeonsi/si_shader.c:1591
#23 0x00007f591b4b9e9f in si_compile_shader (sscreen=0x5629c662a400, compiler=0x5629c662acb0, shader=<optimized out>, debug=0x5629c6cc2360) at ../src/gallium/drivers/radeonsi/si_shader.c:1871
#24 0x00007f591b4badf7 in si_create_shader_variant (sscreen=sscreen@entry=0x5629c662a400, compiler=compiler@entry=0x5629c662acb0, shader=shader@entry=0x5629c6cc2920, debug=debug@entry=0x5629c6cc2360)
    at ../src/gallium/drivers/radeonsi/si_shader.c:2405
#25 0x00007f591b491711 in si_create_compute_state_async (job=job@entry=0x5629c6cc2330, thread_index=thread_index@entry=0) at ../src/gallium/drivers/radeonsi/si_compute.c:185
#26 0x00007f591af59fb1 in util_queue_thread_func (input=input@entry=0x5629c662b9c0) at ../src/util/u_queue.c:308
#27 0x00007f591af59b18 in impl_thrd_routine (p=<optimized out>) at ../include/c11/threads_posix.h:87
#28 0x00007f591c46aea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#29 0x00007f591d071d4f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
alex-t added a comment.EditedOct 23 2020, 7:09 AM

This broke 14 GL_NV_shader_atomic_int64 piglit tests (on Navi 14), e.g. tests/spec/nv_shader_atomic_int64/execution/ssbo-atomicAdd-int.shader_test:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  llvm::PointerIntPair<llvm::ilist_node_base<true>*, 1u, unsigned int, llvm::PointerLikeTypeTraits<llvm::ilist_node_base<true>*>, llvm::PointerIntPairInfo<llvm::ilist_node_base<true>*, 1u, llvm::PointerLikeTypeTraits<llvm::ilist_node_base<true>*> > >::getPointer (this=0x0) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/PointerIntPair.h:59
59	  PointerTy getPointer() const { return Info::getPointer(Value); }
[Current thread is 1 (Thread 0x7f590effd700 (LWP 825448))]
(gdb) bt
#0  llvm::PointerIntPair<llvm::ilist_node_base<true>*, 1u, unsigned int, llvm::PointerLikeTypeTraits<llvm::ilist_node_base<true>*>, llvm::PointerIntPairInfo<llvm::ilist_node_base<true>*, 1u, llvm::PointerLikeTypeTraits<llvm::ilist_node_base<true>*> > >::getPointer (this=0x0) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/PointerIntPair.h:59
#1  llvm::ilist_node_base<true>::getPrev (this=0x0) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist_node_base.h:42
#2  llvm::ilist_base<true>::transferBeforeImpl (Next=..., First=..., Last=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist_base.h:69
#3  llvm::ilist_base<true>::transferBefore<llvm::ilist_node_impl<llvm::ilist_detail::node_options<llvm::MachineBasicBlock, true, false, void> > > (Next=..., First=..., Last=...)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist_base.h:86
#4  llvm::simple_ilist<llvm::MachineBasicBlock>::splice (this=<optimized out>, I=..., First=..., Last=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/simple_ilist.h:249
#5  llvm::iplist_impl<llvm::simple_ilist<llvm::MachineBasicBlock>, llvm::ilist_traits<llvm::MachineBasicBlock> >::transfer (this=<optimized out>, position=..., L2=..., first=..., last=...)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist.h:293
#6  llvm::iplist_impl<llvm::simple_ilist<llvm::MachineBasicBlock>, llvm::ilist_traits<llvm::MachineBasicBlock> >::splice (this=<optimized out>, where=..., L2=..., first=...)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist.h:336
#7  llvm::iplist_impl<llvm::simple_ilist<llvm::MachineBasicBlock>, llvm::ilist_traits<llvm::MachineBasicBlock> >::splice (this=<optimized out>, where=..., L2=..., N=0x7f590014b8a8)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist.h:345
#8  llvm::MachineFunction::splice (this=<optimized out>, InsertPt=..., MBB=0x7f590014b8a8) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/CodeGen/MachineFunction.h:754
#9  (anonymous namespace)::SILowerControlFlow::removeMBBifRedundant (this=0x7f5900034fe0, MBB=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp:731
#10 (anonymous namespace)::SILowerControlFlow::optimizeEndCf (this=0x7f5900034fe0) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp:627
#11 (anonymous namespace)::SILowerControlFlow::runOnMachineFunction (this=<optimized out>, MF=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp:822
#12 0x00007f591856b8dc in llvm::MachineFunctionPass::runOnFunction (this=0x7f5900034fe0, F=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:73
#13 0x00007f59183357f6 in llvm::FPPassManager::runOnFunction (this=<optimized out>, F=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1519
#14 0x00007f591945ac24 in (anonymous namespace)::CGPassManager::RunPassOnSCC (this=<optimized out>, P=0x7f5900039220, CurSCC=..., CG=..., CallGraphUpToDate=<optimized out>, DevirtualizedCall=<optimized out>)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:178
#15 (anonymous namespace)::CGPassManager::RunAllPassesOnSCC (this=<optimized out>, CurSCC=..., CG=..., DevirtualizedCall=<optimized out>) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:476
#16 (anonymous namespace)::CGPassManager::runOnModule (this=<optimized out>, M=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:541
#17 0x00007f5918335f26 in (anonymous namespace)::MPPassManager::runOnModule (this=<optimized out>, M=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1634
#18 llvm::legacy::PassManagerImpl::run (this=0x7f5900013980, M=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:615
#19 0x00007f591833be8e in llvm::legacy::PassManager::run (this=this@entry=0x7f5900013968, M=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1761
#20 0x00007f591b57979f in ac_compile_module_to_elf (p=p@entry=0x7f5900013910, module=<optimized out>, pelf_buffer=pelf_buffer@entry=0x5629c6cc29e0, pelf_size=pelf_size@entry=0x5629c6cc29e8)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/IR/Module.h:906
#21 0x00007f591b4c1844 in si_compile_llvm (sscreen=sscreen@entry=0x5629c662a400, binary=binary@entry=0x5629c6cc29e0, conf=conf@entry=0x5629c6cc29f8, compiler=compiler@entry=0x5629c662acb0, ac=ac@entry=0x7f590effb500, 
    debug=debug@entry=0x5629c6cc2360, stage=MESA_SHADER_COMPUTE, name=0x7f591a7ccc24 "Compute Shader", less_optimized=false) at ../src/gallium/drivers/radeonsi/si_shader_llvm.c:104
#22 0x00007f591b4b8aa1 in si_llvm_compile_shader (sscreen=sscreen@entry=0x5629c662a400, compiler=compiler@entry=0x5629c662acb0, shader=shader@entry=0x5629c6cc2920, debug=debug@entry=0x5629c6cc2360, nir=<optimized out>, 
    nir@entry=0x5629c6ce2040, free_nir=<optimized out>) at ../src/gallium/drivers/radeonsi/si_shader.c:1591
#23 0x00007f591b4b9e9f in si_compile_shader (sscreen=0x5629c662a400, compiler=0x5629c662acb0, shader=<optimized out>, debug=0x5629c6cc2360) at ../src/gallium/drivers/radeonsi/si_shader.c:1871
#24 0x00007f591b4badf7 in si_create_shader_variant (sscreen=sscreen@entry=0x5629c662a400, compiler=compiler@entry=0x5629c662acb0, shader=shader@entry=0x5629c6cc2920, debug=debug@entry=0x5629c6cc2360)
    at ../src/gallium/drivers/radeonsi/si_shader.c:2405
#25 0x00007f591b491711 in si_create_compute_state_async (job=job@entry=0x5629c6cc2330, thread_index=thread_index@entry=0) at ../src/gallium/drivers/radeonsi/si_compute.c:185
#26 0x00007f591af59fb1 in util_queue_thread_func (input=input@entry=0x5629c662b9c0) at ../src/util/u_queue.c:308
#27 0x00007f591af59b18 in impl_thrd_routine (p=<optimized out>) at ../include/c11/threads_posix.h:87
#28 0x00007f591c46aea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#29 0x00007f591d071d4f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Could you please share temporary files, assembly dumps and/or compilation options if it is possible? It would speed up the evaluation.
In fact there is at least one known bug that is going to be fixed soon. If I have the command line that leads to the crash above I could check if this is same reason. It really is same reason according to the stack and null pointer in ilist::getPrev() but I need to run it to ensure it at last work.

alex-t reopened this revision.Oct 23 2020, 2:15 PM

Reopened to address refactoring and bugfixing

This revision is now accepted and ready to land.Oct 23 2020, 2:15 PM
alex-t updated this revision to Diff 300411.Oct 23 2020, 2:17 PM

This change addresses the refactoring adviced by foad. It also contain the fix for the case when getNextNode is null if the successor block is the last in MachineFunction.

alex-t marked 3 inline comments as done.Oct 23 2020, 2:19 PM

It seems to remove the test?

It seems to remove the test?

Not really. The test is unchanged. The diff reflects only the change in the SILowerControlFlow.cpp

It seems to remove the test?

Not really. The test is unchanged. The diff reflects only the change in the SILowerControlFlow.cpp

Can you make a separate review? This one has been already submitted.

foad added a comment.Oct 27 2020, 9:40 AM

It would have been better to start a new review for this, instead of reopening one that has already been pushed.

llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
686

Return early if this condition is *not* true.

693

Return early here.

695

Just return S. Then there's no need for Ret.

705

Assert that MBB has exactly one successor?

711

Assert !FallThrough here (i.e. assert that there is at most one block that falls through into MBB)?

724

Why do you have to test InsertBefore here? Can't you always use std::next(FallThrough) as the insert point? Even if FallThrough is the last BB, the insert point should be valid -- it'll be MF.end().

BTW, new change has successfully passed ePSDB

The new review opened to address curent improvements : https://reviews.llvm.org/D90314

llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
711

There is already the check in IR verifier that asserts if the fallthrough block is the layout successor. As we only can have one layout successor there is only one fallthrough.