This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] SILowerControlFlow::removeMBBifRedundant should not try to change MBB layout if it can fallthrough
AcceptedPublic

Authored by alex-t on Oct 14 2020, 8:39 AM.

Details

Summary

removeMBBifRedundant normally tries to keep predecessors fallthrough when removing redundant MBB.

It has to change MBBs layout to keep the new successor to immediately follow the predecessor of removed MBB.
It only may be allowed in case the new successor itself has no successors to which it fall through.

Diff Detail

Event Timeline

alex-t created this revision.Oct 14 2020, 8:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 14 2020, 8:39 AM
alex-t requested review of this revision.Oct 14 2020, 8:39 AM

Please address couple clang-tidy comments.

llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
693

It can be a regular for loop.

705

Add a blank like after this.

llvm/test/CodeGen/AMDGPU/collapse-endcf.mir
643

This sentence clearly misses some punctuation.

alex-t updated this revision to Diff 298194.Oct 14 2020, 11:08 AM

Changed according the reviewer request.

alex-t marked 3 inline comments as done.Oct 14 2020, 11:09 AM
This revision is now accepted and ready to land.Oct 14 2020, 11:12 AM
alex-t updated this revision to Diff 298339.Oct 15 2020, 3:34 AM

minor bugfix

foad added a subscriber: foad.Oct 15 2020, 5:11 AM
foad added inline comments.
llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
689–690

You don't need to iterate to find the layout successor, just use std::next or similar.

706

Remove this and just return early if it's not redundant?

713–716

Why do we need the SmallVector? Why not just iterate over MBB.predecessors()?

alex-t added inline comments.Oct 15 2020, 1:24 PM
llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
689–690

This is not to find the layout successor. This check if one of the successors is also layout succesor.
Nevertheless, you a right - the loop is not necessary.

706

Thanks for good suggestion. I will address NFC in separate commit soon.
This one is already tested and need to be submitted asap.

713–716

P->ReplaceUsesOfBlockWith(&MBB, Succ) erases P from MBB.predecessors, i.e. calls std::vector.erase(P)
Since it is inside the ReplaceUsesOfBlockWith call I cannot decrement iterator afeter passing it in but before erase execution ("erase(P--)") to keep it valid. So I decided that using separate vector is easier and does not bring too much overhead - it is usually small.

This broke 14 GL_NV_shader_atomic_int64 piglit tests (on Navi 14), e.g. tests/spec/nv_shader_atomic_int64/execution/ssbo-atomicAdd-int.shader_test:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  llvm::PointerIntPair<llvm::ilist_node_base<true>*, 1u, unsigned int, llvm::PointerLikeTypeTraits<llvm::ilist_node_base<true>*>, llvm::PointerIntPairInfo<llvm::ilist_node_base<true>*, 1u, llvm::PointerLikeTypeTraits<llvm::ilist_node_base<true>*> > >::getPointer (this=0x0) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/PointerIntPair.h:59
59	  PointerTy getPointer() const { return Info::getPointer(Value); }
[Current thread is 1 (Thread 0x7f590effd700 (LWP 825448))]
(gdb) bt
#0  llvm::PointerIntPair<llvm::ilist_node_base<true>*, 1u, unsigned int, llvm::PointerLikeTypeTraits<llvm::ilist_node_base<true>*>, llvm::PointerIntPairInfo<llvm::ilist_node_base<true>*, 1u, llvm::PointerLikeTypeTraits<llvm::ilist_node_base<true>*> > >::getPointer (this=0x0) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/PointerIntPair.h:59
#1  llvm::ilist_node_base<true>::getPrev (this=0x0) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist_node_base.h:42
#2  llvm::ilist_base<true>::transferBeforeImpl (Next=..., First=..., Last=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist_base.h:69
#3  llvm::ilist_base<true>::transferBefore<llvm::ilist_node_impl<llvm::ilist_detail::node_options<llvm::MachineBasicBlock, true, false, void> > > (Next=..., First=..., Last=...)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist_base.h:86
#4  llvm::simple_ilist<llvm::MachineBasicBlock>::splice (this=<optimized out>, I=..., First=..., Last=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/simple_ilist.h:249
#5  llvm::iplist_impl<llvm::simple_ilist<llvm::MachineBasicBlock>, llvm::ilist_traits<llvm::MachineBasicBlock> >::transfer (this=<optimized out>, position=..., L2=..., first=..., last=...)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist.h:293
#6  llvm::iplist_impl<llvm::simple_ilist<llvm::MachineBasicBlock>, llvm::ilist_traits<llvm::MachineBasicBlock> >::splice (this=<optimized out>, where=..., L2=..., first=...)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist.h:336
#7  llvm::iplist_impl<llvm::simple_ilist<llvm::MachineBasicBlock>, llvm::ilist_traits<llvm::MachineBasicBlock> >::splice (this=<optimized out>, where=..., L2=..., N=0x7f590014b8a8)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist.h:345
#8  llvm::MachineFunction::splice (this=<optimized out>, InsertPt=..., MBB=0x7f590014b8a8) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/CodeGen/MachineFunction.h:754
#9  (anonymous namespace)::SILowerControlFlow::removeMBBifRedundant (this=0x7f5900034fe0, MBB=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp:731
#10 (anonymous namespace)::SILowerControlFlow::optimizeEndCf (this=0x7f5900034fe0) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp:627
#11 (anonymous namespace)::SILowerControlFlow::runOnMachineFunction (this=<optimized out>, MF=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp:822
#12 0x00007f591856b8dc in llvm::MachineFunctionPass::runOnFunction (this=0x7f5900034fe0, F=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:73
#13 0x00007f59183357f6 in llvm::FPPassManager::runOnFunction (this=<optimized out>, F=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1519
#14 0x00007f591945ac24 in (anonymous namespace)::CGPassManager::RunPassOnSCC (this=<optimized out>, P=0x7f5900039220, CurSCC=..., CG=..., CallGraphUpToDate=<optimized out>, DevirtualizedCall=<optimized out>)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:178
#15 (anonymous namespace)::CGPassManager::RunAllPassesOnSCC (this=<optimized out>, CurSCC=..., CG=..., DevirtualizedCall=<optimized out>) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:476
#16 (anonymous namespace)::CGPassManager::runOnModule (this=<optimized out>, M=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:541
#17 0x00007f5918335f26 in (anonymous namespace)::MPPassManager::runOnModule (this=<optimized out>, M=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1634
#18 llvm::legacy::PassManagerImpl::run (this=0x7f5900013980, M=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:615
#19 0x00007f591833be8e in llvm::legacy::PassManager::run (this=this@entry=0x7f5900013968, M=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1761
#20 0x00007f591b57979f in ac_compile_module_to_elf (p=p@entry=0x7f5900013910, module=<optimized out>, pelf_buffer=pelf_buffer@entry=0x5629c6cc29e0, pelf_size=pelf_size@entry=0x5629c6cc29e8)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/IR/Module.h:906
#21 0x00007f591b4c1844 in si_compile_llvm (sscreen=sscreen@entry=0x5629c662a400, binary=binary@entry=0x5629c6cc29e0, conf=conf@entry=0x5629c6cc29f8, compiler=compiler@entry=0x5629c662acb0, ac=ac@entry=0x7f590effb500, 
    debug=debug@entry=0x5629c6cc2360, stage=MESA_SHADER_COMPUTE, name=0x7f591a7ccc24 "Compute Shader", less_optimized=false) at ../src/gallium/drivers/radeonsi/si_shader_llvm.c:104
#22 0x00007f591b4b8aa1 in si_llvm_compile_shader (sscreen=sscreen@entry=0x5629c662a400, compiler=compiler@entry=0x5629c662acb0, shader=shader@entry=0x5629c6cc2920, debug=debug@entry=0x5629c6cc2360, nir=<optimized out>, 
    nir@entry=0x5629c6ce2040, free_nir=<optimized out>) at ../src/gallium/drivers/radeonsi/si_shader.c:1591
#23 0x00007f591b4b9e9f in si_compile_shader (sscreen=0x5629c662a400, compiler=0x5629c662acb0, shader=<optimized out>, debug=0x5629c6cc2360) at ../src/gallium/drivers/radeonsi/si_shader.c:1871
#24 0x00007f591b4badf7 in si_create_shader_variant (sscreen=sscreen@entry=0x5629c662a400, compiler=compiler@entry=0x5629c662acb0, shader=shader@entry=0x5629c6cc2920, debug=debug@entry=0x5629c6cc2360)
    at ../src/gallium/drivers/radeonsi/si_shader.c:2405
#25 0x00007f591b491711 in si_create_compute_state_async (job=job@entry=0x5629c6cc2330, thread_index=thread_index@entry=0) at ../src/gallium/drivers/radeonsi/si_compute.c:185
#26 0x00007f591af59fb1 in util_queue_thread_func (input=input@entry=0x5629c662b9c0) at ../src/util/u_queue.c:308
#27 0x00007f591af59b18 in impl_thrd_routine (p=<optimized out>) at ../include/c11/threads_posix.h:87
#28 0x00007f591c46aea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#29 0x00007f591d071d4f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
alex-t added a comment.EditedOct 23 2020, 7:09 AM

This broke 14 GL_NV_shader_atomic_int64 piglit tests (on Navi 14), e.g. tests/spec/nv_shader_atomic_int64/execution/ssbo-atomicAdd-int.shader_test:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  llvm::PointerIntPair<llvm::ilist_node_base<true>*, 1u, unsigned int, llvm::PointerLikeTypeTraits<llvm::ilist_node_base<true>*>, llvm::PointerIntPairInfo<llvm::ilist_node_base<true>*, 1u, llvm::PointerLikeTypeTraits<llvm::ilist_node_base<true>*> > >::getPointer (this=0x0) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/PointerIntPair.h:59
59	  PointerTy getPointer() const { return Info::getPointer(Value); }
[Current thread is 1 (Thread 0x7f590effd700 (LWP 825448))]
(gdb) bt
#0  llvm::PointerIntPair<llvm::ilist_node_base<true>*, 1u, unsigned int, llvm::PointerLikeTypeTraits<llvm::ilist_node_base<true>*>, llvm::PointerIntPairInfo<llvm::ilist_node_base<true>*, 1u, llvm::PointerLikeTypeTraits<llvm::ilist_node_base<true>*> > >::getPointer (this=0x0) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/PointerIntPair.h:59
#1  llvm::ilist_node_base<true>::getPrev (this=0x0) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist_node_base.h:42
#2  llvm::ilist_base<true>::transferBeforeImpl (Next=..., First=..., Last=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist_base.h:69
#3  llvm::ilist_base<true>::transferBefore<llvm::ilist_node_impl<llvm::ilist_detail::node_options<llvm::MachineBasicBlock, true, false, void> > > (Next=..., First=..., Last=...)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist_base.h:86
#4  llvm::simple_ilist<llvm::MachineBasicBlock>::splice (this=<optimized out>, I=..., First=..., Last=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/simple_ilist.h:249
#5  llvm::iplist_impl<llvm::simple_ilist<llvm::MachineBasicBlock>, llvm::ilist_traits<llvm::MachineBasicBlock> >::transfer (this=<optimized out>, position=..., L2=..., first=..., last=...)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist.h:293
#6  llvm::iplist_impl<llvm::simple_ilist<llvm::MachineBasicBlock>, llvm::ilist_traits<llvm::MachineBasicBlock> >::splice (this=<optimized out>, where=..., L2=..., first=...)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist.h:336
#7  llvm::iplist_impl<llvm::simple_ilist<llvm::MachineBasicBlock>, llvm::ilist_traits<llvm::MachineBasicBlock> >::splice (this=<optimized out>, where=..., L2=..., N=0x7f590014b8a8)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist.h:345
#8  llvm::MachineFunction::splice (this=<optimized out>, InsertPt=..., MBB=0x7f590014b8a8) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/CodeGen/MachineFunction.h:754
#9  (anonymous namespace)::SILowerControlFlow::removeMBBifRedundant (this=0x7f5900034fe0, MBB=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp:731
#10 (anonymous namespace)::SILowerControlFlow::optimizeEndCf (this=0x7f5900034fe0) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp:627
#11 (anonymous namespace)::SILowerControlFlow::runOnMachineFunction (this=<optimized out>, MF=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp:822
#12 0x00007f591856b8dc in llvm::MachineFunctionPass::runOnFunction (this=0x7f5900034fe0, F=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:73
#13 0x00007f59183357f6 in llvm::FPPassManager::runOnFunction (this=<optimized out>, F=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1519
#14 0x00007f591945ac24 in (anonymous namespace)::CGPassManager::RunPassOnSCC (this=<optimized out>, P=0x7f5900039220, CurSCC=..., CG=..., CallGraphUpToDate=<optimized out>, DevirtualizedCall=<optimized out>)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:178
#15 (anonymous namespace)::CGPassManager::RunAllPassesOnSCC (this=<optimized out>, CurSCC=..., CG=..., DevirtualizedCall=<optimized out>) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:476
#16 (anonymous namespace)::CGPassManager::runOnModule (this=<optimized out>, M=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:541
#17 0x00007f5918335f26 in (anonymous namespace)::MPPassManager::runOnModule (this=<optimized out>, M=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1634
#18 llvm::legacy::PassManagerImpl::run (this=0x7f5900013980, M=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:615
#19 0x00007f591833be8e in llvm::legacy::PassManager::run (this=this@entry=0x7f5900013968, M=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1761
#20 0x00007f591b57979f in ac_compile_module_to_elf (p=p@entry=0x7f5900013910, module=<optimized out>, pelf_buffer=pelf_buffer@entry=0x5629c6cc29e0, pelf_size=pelf_size@entry=0x5629c6cc29e8)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/IR/Module.h:906
#21 0x00007f591b4c1844 in si_compile_llvm (sscreen=sscreen@entry=0x5629c662a400, binary=binary@entry=0x5629c6cc29e0, conf=conf@entry=0x5629c6cc29f8, compiler=compiler@entry=0x5629c662acb0, ac=ac@entry=0x7f590effb500, 
    debug=debug@entry=0x5629c6cc2360, stage=MESA_SHADER_COMPUTE, name=0x7f591a7ccc24 "Compute Shader", less_optimized=false) at ../src/gallium/drivers/radeonsi/si_shader_llvm.c:104
#22 0x00007f591b4b8aa1 in si_llvm_compile_shader (sscreen=sscreen@entry=0x5629c662a400, compiler=compiler@entry=0x5629c662acb0, shader=shader@entry=0x5629c6cc2920, debug=debug@entry=0x5629c6cc2360, nir=<optimized out>, 
    nir@entry=0x5629c6ce2040, free_nir=<optimized out>) at ../src/gallium/drivers/radeonsi/si_shader.c:1591
#23 0x00007f591b4b9e9f in si_compile_shader (sscreen=0x5629c662a400, compiler=0x5629c662acb0, shader=<optimized out>, debug=0x5629c6cc2360) at ../src/gallium/drivers/radeonsi/si_shader.c:1871
#24 0x00007f591b4badf7 in si_create_shader_variant (sscreen=sscreen@entry=0x5629c662a400, compiler=compiler@entry=0x5629c662acb0, shader=shader@entry=0x5629c6cc2920, debug=debug@entry=0x5629c6cc2360)
    at ../src/gallium/drivers/radeonsi/si_shader.c:2405
#25 0x00007f591b491711 in si_create_compute_state_async (job=job@entry=0x5629c6cc2330, thread_index=thread_index@entry=0) at ../src/gallium/drivers/radeonsi/si_compute.c:185
#26 0x00007f591af59fb1 in util_queue_thread_func (input=input@entry=0x5629c662b9c0) at ../src/util/u_queue.c:308
#27 0x00007f591af59b18 in impl_thrd_routine (p=<optimized out>) at ../include/c11/threads_posix.h:87
#28 0x00007f591c46aea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#29 0x00007f591d071d4f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Could you please share temporary files, assembly dumps and/or compilation options if it is possible? It would speed up the evaluation.
In fact there is at least one known bug that is going to be fixed soon. If I have the command line that leads to the crash above I could check if this is same reason. It really is same reason according to the stack and null pointer in ilist::getPrev() but I need to run it to ensure it at last work.

alex-t reopened this revision.Oct 23 2020, 2:15 PM

Reopened to address refactoring and bugfixing

This revision is now accepted and ready to land.Oct 23 2020, 2:15 PM
alex-t updated this revision to Diff 300411.Oct 23 2020, 2:17 PM

This change addresses the refactoring adviced by foad. It also contain the fix for the case when getNextNode is null if the successor block is the last in MachineFunction.

alex-t marked 3 inline comments as done.Oct 23 2020, 2:19 PM

It seems to remove the test?

It seems to remove the test?

Not really. The test is unchanged. The diff reflects only the change in the SILowerControlFlow.cpp

It seems to remove the test?

Not really. The test is unchanged. The diff reflects only the change in the SILowerControlFlow.cpp

Can you make a separate review? This one has been already submitted.

foad added a comment.Oct 27 2020, 9:40 AM

It would have been better to start a new review for this, instead of reopening one that has already been pushed.

llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
687

Return early if this condition is *not* true.

687

Just return S. Then there's no need for Ret.

694

Return early here.

738

Assert that MBB has exactly one successor?

744

Assert !FallThrough here (i.e. assert that there is at most one block that falls through into MBB)?

750

Why do you have to test InsertBefore here? Can't you always use std::next(FallThrough) as the insert point? Even if FallThrough is the last BB, the insert point should be valid -- it'll be MF.end().

BTW, new change has successfully passed ePSDB

The new review opened to address curent improvements : https://reviews.llvm.org/D90314

llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
744

There is already the check in IR verifier that asserts if the fallthrough block is the layout successor. As we only can have one layout successor there is only one fallthrough.