LGTM
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Tue, Feb 23
Dec 28 2020
Dec 24 2020
The testcase is splitted into 2 separate functions. LLC run line added.
In D91435#2470128, @arsenm wrote:There should be both tests, not one larger function that hits both
Dec 23 2020
New test created to cover both failed cases:
- end_cf in the block that has yet another predecessor besides that one defining the exec mask
- given the pattern above - not any visited node denotes a loop, only when we have a backedge i.e. block's successor dominates the block.
Dec 18 2020
In D91435#2461075, @arsenm wrote:Needs a testcase for the second broken case
Dec 17 2020
There was a bug in SIAnnotateControlFlow. Visited node is not necessarily means loop. It may be CF join instead.
Added check that Term successor visited and dominates Terms's parent.
Bug in SIAnnotateControlFlow fixed: simple CF join should not be treated as loop
Dec 16 2020
REUIRES x86 && amdgpu clause added in test
Test that ensures optimization disabled for the target with divergent CF and enabled otherwise.
Dec 15 2020
Dec 12 2020
Could you share the original testcase then? I only have that reduced one attached to the Jira ticket.
And it works for it.
Nov 26 2020
ping
Nov 24 2020
Ping
Nov 23 2020
In D82194#2356308, @foad wrote:@alex-t are you still planning to work on this? Or has it been (partly or wholly) superseded by
@piotr's rG0045786f146e78afee49eee053dc29ebc842fee1?
dyn_cast changed to cast
Nov 19 2020
Ping!
Nov 16 2020
The odd lines removed from the test
I would agree with Stas here.
In case you can identify the patterns that require the lookup deeper then 6 levels, you probably can formulate the exact threshold.
And adding tests for such a pattern would make it clear.
It is not clear to me why do we need to query divergence information for MachineSDNode?
After unstruction selection is done we should have all the instructions selected correctly to VALU vs SALU basing on the information that is available at the selection stage.
Thus, we can use isDivergent bit value set for the MachineSDNode in case we need to recompute or update divergence information after selection.
So, instead of adding machine opcodes to isSDNodeSourceOfDivergence it is better to mark that opcodes right away as they are selected.
Nov 13 2020
Test added
Oct 30 2020
Typo in assert message corrected.
The patch is passed PSDB.
Oct 29 2020
assert expression fixed.
Oct 28 2020
The new review opened to address curent improvements : https://reviews.llvm.org/D90314
Oct 27 2020
BTW, new change has successfully passed ePSDB
In D89397#2354798, @rampitec wrote:It seems to remove the test?
Oct 26 2020
Oct 23 2020
This change addresses the refactoring adviced by foad. It also contain the fix for the case when getNextNode is null if the successor block is the last in MachineFunction.
Reopened to address refactoring and bugfixing
In D89397#2342146, @michel.daenzer wrote:This broke 14 GL_NV_shader_atomic_int64 piglit tests (on Navi 14), e.g. tests/spec/nv_shader_atomic_int64/execution/ssbo-atomicAdd-int.shader_test:
Program terminated with signal SIGSEGV, Segmentation fault. #0 llvm::PointerIntPair<llvm::ilist_node_base<true>*, 1u, unsigned int, llvm::PointerLikeTypeTraits<llvm::ilist_node_base<true>*>, llvm::PointerIntPairInfo<llvm::ilist_node_base<true>*, 1u, llvm::PointerLikeTypeTraits<llvm::ilist_node_base<true>*> > >::getPointer (this=0x0) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/PointerIntPair.h:59 59 PointerTy getPointer() const { return Info::getPointer(Value); } [Current thread is 1 (Thread 0x7f590effd700 (LWP 825448))] (gdb) bt #0 llvm::PointerIntPair<llvm::ilist_node_base<true>*, 1u, unsigned int, llvm::PointerLikeTypeTraits<llvm::ilist_node_base<true>*>, llvm::PointerIntPairInfo<llvm::ilist_node_base<true>*, 1u, llvm::PointerLikeTypeTraits<llvm::ilist_node_base<true>*> > >::getPointer (this=0x0) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/PointerIntPair.h:59 #1 llvm::ilist_node_base<true>::getPrev (this=0x0) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist_node_base.h:42 #2 llvm::ilist_base<true>::transferBeforeImpl (Next=..., First=..., Last=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist_base.h:69 #3 llvm::ilist_base<true>::transferBefore<llvm::ilist_node_impl<llvm::ilist_detail::node_options<llvm::MachineBasicBlock, true, false, void> > > (Next=..., First=..., Last=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist_base.h:86 #4 llvm::simple_ilist<llvm::MachineBasicBlock>::splice (this=<optimized out>, I=..., First=..., Last=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/simple_ilist.h:249 #5 llvm::iplist_impl<llvm::simple_ilist<llvm::MachineBasicBlock>, llvm::ilist_traits<llvm::MachineBasicBlock> >::transfer (this=<optimized out>, position=..., L2=..., first=..., last=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist.h:293 #6 llvm::iplist_impl<llvm::simple_ilist<llvm::MachineBasicBlock>, llvm::ilist_traits<llvm::MachineBasicBlock> >::splice (this=<optimized out>, where=..., L2=..., first=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist.h:336 #7 llvm::iplist_impl<llvm::simple_ilist<llvm::MachineBasicBlock>, llvm::ilist_traits<llvm::MachineBasicBlock> >::splice (this=<optimized out>, where=..., L2=..., N=0x7f590014b8a8) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/ADT/ilist.h:345 #8 llvm::MachineFunction::splice (this=<optimized out>, InsertPt=..., MBB=0x7f590014b8a8) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/CodeGen/MachineFunction.h:754 #9 (anonymous namespace)::SILowerControlFlow::removeMBBifRedundant (this=0x7f5900034fe0, MBB=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp:731 #10 (anonymous namespace)::SILowerControlFlow::optimizeEndCf (this=0x7f5900034fe0) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp:627 #11 (anonymous namespace)::SILowerControlFlow::runOnMachineFunction (this=<optimized out>, MF=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp:822 #12 0x00007f591856b8dc in llvm::MachineFunctionPass::runOnFunction (this=0x7f5900034fe0, F=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:73 #13 0x00007f59183357f6 in llvm::FPPassManager::runOnFunction (this=<optimized out>, F=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1519 #14 0x00007f591945ac24 in (anonymous namespace)::CGPassManager::RunPassOnSCC (this=<optimized out>, P=0x7f5900039220, CurSCC=..., CG=..., CallGraphUpToDate=<optimized out>, DevirtualizedCall=<optimized out>) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:178 #15 (anonymous namespace)::CGPassManager::RunAllPassesOnSCC (this=<optimized out>, CurSCC=..., CG=..., DevirtualizedCall=<optimized out>) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:476 #16 (anonymous namespace)::CGPassManager::runOnModule (this=<optimized out>, M=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:541 #17 0x00007f5918335f26 in (anonymous namespace)::MPPassManager::runOnModule (this=<optimized out>, M=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1634 #18 llvm::legacy::PassManagerImpl::run (this=0x7f5900013980, M=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:615 #19 0x00007f591833be8e in llvm::legacy::PassManager::run (this=this@entry=0x7f5900013968, M=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1761 #20 0x00007f591b57979f in ac_compile_module_to_elf (p=p@entry=0x7f5900013910, module=<optimized out>, pelf_buffer=pelf_buffer@entry=0x5629c6cc29e0, pelf_size=pelf_size@entry=0x5629c6cc29e8) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/IR/Module.h:906 #21 0x00007f591b4c1844 in si_compile_llvm (sscreen=sscreen@entry=0x5629c662a400, binary=binary@entry=0x5629c6cc29e0, conf=conf@entry=0x5629c6cc29f8, compiler=compiler@entry=0x5629c662acb0, ac=ac@entry=0x7f590effb500, debug=debug@entry=0x5629c6cc2360, stage=MESA_SHADER_COMPUTE, name=0x7f591a7ccc24 "Compute Shader", less_optimized=false) at ../src/gallium/drivers/radeonsi/si_shader_llvm.c:104 #22 0x00007f591b4b8aa1 in si_llvm_compile_shader (sscreen=sscreen@entry=0x5629c662a400, compiler=compiler@entry=0x5629c662acb0, shader=shader@entry=0x5629c6cc2920, debug=debug@entry=0x5629c6cc2360, nir=<optimized out>, nir@entry=0x5629c6ce2040, free_nir=<optimized out>) at ../src/gallium/drivers/radeonsi/si_shader.c:1591 #23 0x00007f591b4b9e9f in si_compile_shader (sscreen=0x5629c662a400, compiler=0x5629c662acb0, shader=<optimized out>, debug=0x5629c6cc2360) at ../src/gallium/drivers/radeonsi/si_shader.c:1871 #24 0x00007f591b4badf7 in si_create_shader_variant (sscreen=sscreen@entry=0x5629c662a400, compiler=compiler@entry=0x5629c662acb0, shader=shader@entry=0x5629c6cc2920, debug=debug@entry=0x5629c6cc2360) at ../src/gallium/drivers/radeonsi/si_shader.c:2405 #25 0x00007f591b491711 in si_create_compute_state_async (job=job@entry=0x5629c6cc2330, thread_index=thread_index@entry=0) at ../src/gallium/drivers/radeonsi/si_compute.c:185 #26 0x00007f591af59fb1 in util_queue_thread_func (input=input@entry=0x5629c662b9c0) at ../src/util/u_queue.c:308 #27 0x00007f591af59b18 in impl_thrd_routine (p=<optimized out>) at ../include/c11/threads_posix.h:87 #28 0x00007f591c46aea7 in start_thread (arg=<optimized out>) at pthread_create.c:477 #29 0x00007f591d071d4f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Oct 15 2020
minor bugfix
Oct 14 2020
Changed according the reviewer request.
Sep 22 2020
In D87882#2282278, @rampitec wrote:Doesn't loop block self dominate?
Sep 17 2020
Sep 16 2020
tests moved to existing rotl/rotr tests
Sep 15 2020
Tests added. ROTR case added.
Sep 14 2020
Sep 8 2020
In D87107#2261429, @rampitec wrote:In D87107#2259217, @alex-t wrote:The idea is:
For the block that is queried
- Look for it's predecessors that can pass control through the S_EXECZ/EXECNZ
- If found one, look for exec restoring code starting the beginning of the block being queried.
- Since exec restoring code always belong to the block prologue, search the prologue and if not found return false.
Considering your comment that exec == 0 does not matter, we'd rather search upwards before the immediate dominator block in encountered to check what we met first - exec modify or exec restore. The problem here is that XOR can be both.
Thanks. More or less you want to disable split in an empty block preceded by c_branch_exec[n]z. I can understand why this would be a problem. In reality such block can be either empty or contain another branch, because it does not make any sense to pass a control to a block with vector instructions and have no active lanes.
But this leaves couple problems:
- It does not disallow a split in a block prologue before exec is restored. This creates exactly the same problem.
- It does not disallow a split even in an empty block where EXEC is not zero, but just wrong. The problem is not zero EXEC, the problem is wrong EXEC, zero is just once case of this.
JBTW, even with all of this it is still OK to split an LI of SGPR. I'd say at the very least callback needs to take an LI in question as well.
Sep 7 2020
Enhanced PSDB passed
The idea is:
Sep 4 2020
In D87107#2256913, @rampitec wrote:In D87107#2256864, @alex-t wrote:Also I still think that disabling a whole "endif" block is an overkill.
It only is disabled if S_OR_B64 exec, ... is in the middle of the block that should never happen.
while (isBasicBlockPrologue(*J)) { if (IsExecRestore(&*J)) return true;assumes that if exec is restored in the block prologue it is valid
So practically it never happens and split is effectively only disabled in an empty block? I said it already: it does not matter that exec is zero, what matters is that it does not match. It does not matter that a block is empty as well, it is enough to split before s_or to hit the bug.
Also I still think that disabling a whole "endif" block is an overkill.
So, since we now have sensible diff to discuss...
Why I decided to disallow split in any block that gets control with exec == 0 and has no restoring code in prologue?
I just did it by example of the code that already does same for the blocks with interference - a bit later below:
Now the correct diff uploaded
Oops. In fact the diff above is not that I was intended to upload. The SIInstrInfo::IsValidForLISplit is a complete nonsense. Probably deleted a part of the function by accident...
I'll get back and upload the working one.
Sep 3 2020
Sep 2 2020
diff rebased to latest trunk
Sep 1 2020
No redundant branches anymore
Aug 28 2020
The only difference is that now these redundant branch is inserted by MachineBasicBlock::updateTerminator() as Matt suggested.
changed as requested by reviewer
Added MachineDominatorTree and MachineLoopInfo update after redundant block removal.
Aug 27 2020
In D86634#2240606, @arsenm wrote:Should also add a few cases with other empty block situations, including with debug info.
Also should add an example where the original problem occurred
Changes as requested by reviewer.
Aug 26 2020
Jun 26 2020
This small piece was missed from the change.
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp index 5f1afdd7f10..7180e0a8d52 100644 --- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp +++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp @@ -634,6 +634,9 @@ void SIInstrInfo::copyPhysReg(MachineBasicBlock &MBB, }
In D82194#2111508, @michel.daenzer wrote:This change broke thousands of piglit gpu profile tests with Mesa radeonsi on Navi 14.
and as well as the failures it caused spurious debug output like:
Test case 'dEQP-VK.subgroups.arithmetic.framebuffer.subgroupmax_int_tess_eval'.. S_CMP_LG_U32 killed $sgpr2_sgpr3, 0, implicit-def $scc S_CMP_LG_U32 killed $sgpr0_sgpr1, 0, implicit-def $scc Fail (Failed!)
Jun 25 2020
In D82194#2111508, @michel.daenzer wrote:This change broke thousands of piglit gpu profile tests with Mesa radeonsi on Navi 14.
Jun 24 2020
Jun 23 2020
udivrem.ll checks updated
Formatting fixed. test extract_vector_dynelt.ll changed.
Jun 20 2020
Code changed according to the reviewer request
Jun 19 2020
May 28 2020
May 27 2020
test corrected
Ping again. Could you please take a look?