- User Since
- Jul 26 2016, 7:17 AM (235 w, 14 h)
Mon, Dec 28
Dec 24 2020
The testcase is splitted into 2 separate functions. LLC run line added.
Dec 23 2020
New test created to cover both failed cases:
- end_cf in the block that has yet another predecessor besides that one defining the exec mask
- given the pattern above - not any visited node denotes a loop, only when we have a backedge i.e. block's successor dominates the block.
Dec 18 2020
Dec 17 2020
There was a bug in SIAnnotateControlFlow. Visited node is not necessarily means loop. It may be CF join instead.
Added check that Term successor visited and dominates Terms's parent.
Bug in SIAnnotateControlFlow fixed: simple CF join should not be treated as loop
Dec 16 2020
REUIRES x86 && amdgpu clause added in test
Test that ensures optimization disabled for the target with divergent CF and enabled otherwise.
Dec 15 2020
Dec 12 2020
Could you share the original testcase then? I only have that reduced one attached to the Jira ticket.
And it works for it.
Nov 26 2020
Nov 24 2020
Nov 23 2020
dyn_cast changed to cast
Nov 19 2020
Nov 16 2020
The odd lines removed from the test
I would agree with Stas here.
In case you can identify the patterns that require the lookup deeper then 6 levels, you probably can formulate the exact threshold.
And adding tests for such a pattern would make it clear.
It is not clear to me why do we need to query divergence information for MachineSDNode?
After unstruction selection is done we should have all the instructions selected correctly to VALU vs SALU basing on the information that is available at the selection stage.
Thus, we can use isDivergent bit value set for the MachineSDNode in case we need to recompute or update divergence information after selection.
So, instead of adding machine opcodes to isSDNodeSourceOfDivergence it is better to mark that opcodes right away as they are selected.
Nov 13 2020
Oct 30 2020
Typo in assert message corrected.
The patch is passed PSDB.
Oct 29 2020
assert expression fixed.
Oct 28 2020
The new review opened to address curent improvements : https://reviews.llvm.org/D90314
Oct 27 2020
BTW, new change has successfully passed ePSDB
Oct 26 2020
Oct 23 2020
This change addresses the refactoring adviced by foad. It also contain the fix for the case when getNextNode is null if the successor block is the last in MachineFunction.
Reopened to address refactoring and bugfixing
Oct 15 2020
Oct 14 2020
Changed according the reviewer request.
Sep 22 2020
Sep 17 2020
Sep 16 2020
tests moved to existing rotl/rotr tests
Sep 15 2020
Tests added. ROTR case added.
Sep 14 2020
Sep 8 2020
Sep 7 2020
Enhanced PSDB passed
The idea is:
Sep 4 2020
Also I still think that disabling a whole "endif" block is an overkill.
So, since we now have sensible diff to discuss...
Why I decided to disallow split in any block that gets control with exec == 0 and has no restoring code in prologue?
I just did it by example of the code that already does same for the blocks with interference - a bit later below:
Now the correct diff uploaded
Oops. In fact the diff above is not that I was intended to upload. The SIInstrInfo::IsValidForLISplit is a complete nonsense. Probably deleted a part of the function by accident...
I'll get back and upload the working one.
Sep 3 2020
Sep 2 2020
diff rebased to latest trunk
Sep 1 2020
No redundant branches anymore
Aug 28 2020
The only difference is that now these redundant branch is inserted by MachineBasicBlock::updateTerminator() as Matt suggested.
changed as requested by reviewer
Added MachineDominatorTree and MachineLoopInfo update after redundant block removal.
Aug 27 2020
Changes as requested by reviewer.
Aug 26 2020
Jun 26 2020
This small piece was missed from the change.
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp index 5f1afdd7f10..7180e0a8d52 100644 --- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp +++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp @@ -634,6 +634,9 @@ void SIInstrInfo::copyPhysReg(MachineBasicBlock &MBB, }
and as well as the failures it caused spurious debug output like:Test case 'dEQP-VK.subgroups.arithmetic.framebuffer.subgroupmax_int_tess_eval'.. S_CMP_LG_U32 killed $sgpr2_sgpr3, 0, implicit-def $scc S_CMP_LG_U32 killed $sgpr0_sgpr1, 0, implicit-def $scc Fail (Failed!)
Jun 25 2020
Jun 24 2020
Jun 23 2020
udivrem.ll checks updated
Formatting fixed. test extract_vector_dynelt.ll changed.
Jun 20 2020
Code changed according to the reviewer request
Jun 19 2020
May 28 2020
May 27 2020
Ping again. Could you please take a look?