Page MenuHomePhabricator

sameerds (Sameer Sahasrabuddhe)
User

Projects

User does not belong to any projects.

User Details

User Since
Oct 17 2012, 11:25 AM (424 w, 14 h)

Working on ROCm at AMD

Recent Activity

Sat, Nov 7

sameerds added a comment to D85603: IR: Add convergence control operand bundle and intrinsics.

Roughly speaking, there are subgroup ops with "implicit" thread set (what Vinod calls textually aligned, and what this proposal mostly focuses on, because they require the most additional explanation) and subgroup ops with "explicit" thread set (sm70+).

Sat, Nov 7, 3:42 AM · Restricted Project

Fri, Nov 6

sameerds added a comment to D85603: IR: Add convergence control operand bundle and intrinsics.
  • Will this paint us into a corner wrt CUDA, and specifically sm70+?

/me summons @wash, who is probably a better person to speak to this than me.

My understanding is that the semantics of <sm70 convergent are pretty similar to what is described in these examples. But starting in sm70+, each sync operation takes an arg specifying which threads in the warp participate in the instruction.

I admit I do not fully understand what the purpose of this is. At one point in time I thought it was to let humans write (or compilers generate) code like this, where the identity of the convergent instruction does not matter.

// Warning, does not seem to work on sm75
if (cond)
  __syncwarp(FULL_MASK);
else
  __syncwarp(FULL_MASK);

but my testcase, https://gist.github.com/50d1b5fedc926c879a64436229c1cc05, dies with an illegal-instruction error (715) when I make cond have different values within the warp. So, guess not?

Anyway, clearly I don't fully understand the sm70+ convergence semantics. I'd ideally like someone from nvidia (hi, @wash) to speak to whether we can represent their convergent instruction semantics using this proposal. Then we should also double-check that clang can in fact generate the relevant LLVM IR.

Fri, Nov 6, 1:17 AM · Restricted Project

Tue, Nov 3

sameerds added a comment to D85603: IR: Add convergence control operand bundle and intrinsics.

I believe what is described here about convergent, as best I can understand it, is the semantics of syncthreads in CUDA. This semantics is the same for <sm70 and sm70+. Not clear whether what is described here is a "textually aligned" semantics or unaligned. syncthreads is aligned, meaning that all threads in the threadblock must wait on the same lexical syncthreads().

Tue, Nov 3, 1:03 AM · Restricted Project

Oct 8 2020

sameerds accepted D89051: [FixIrreducible][NewPM] Port -fix-irreducible to NPM.

LGTM. Thanks!

Oct 8 2020, 11:11 PM · Restricted Project

Sep 16 2020

sameerds added inline comments to D85603: IR: Add convergence control operand bundle and intrinsics.
Sep 16 2020, 4:51 AM · Restricted Project
sameerds added inline comments to D85603: IR: Add convergence control operand bundle and intrinsics.
Sep 16 2020, 4:43 AM · Restricted Project

Sep 13 2020

sameerds accepted D87548: [UnifyLoopExits] Fix non-deterministic iteration order.

LGTM!

Sep 13 2020, 5:26 PM · Restricted Project

Sep 7 2020

sameerds accepted D84413: [DA][SDA] SyncDependenceAnalysis re-write.

LGTM, to the extent of my high-level understanding of the analysis. I would recommend waiting a few days and give others a chance to finish the review. I do have some nitpicks, but feel free to decide which ones warrant a change.

Sep 7 2020, 3:03 AM · Restricted Project

Aug 13 2020

sameerds added inline comments to D85603: IR: Add convergence control operand bundle and intrinsics.
Aug 13 2020, 9:40 AM · Restricted Project
sameerds added inline comments to D85603: IR: Add convergence control operand bundle and intrinsics.
Aug 13 2020, 2:57 AM · Restricted Project

Aug 12 2020

sameerds added inline comments to D85603: IR: Add convergence control operand bundle and intrinsics.
Aug 12 2020, 9:59 PM · Restricted Project
sameerds added inline comments to D85603: IR: Add convergence control operand bundle and intrinsics.
Aug 12 2020, 12:03 AM · Restricted Project

Aug 11 2020

sameerds added inline comments to D85603: IR: Add convergence control operand bundle and intrinsics.
Aug 11 2020, 9:49 PM · Restricted Project
sameerds added inline comments to D85603: IR: Add convergence control operand bundle and intrinsics.
Aug 11 2020, 1:42 AM · Restricted Project
sameerds added inline comments to D85603: IR: Add convergence control operand bundle and intrinsics.
Aug 11 2020, 1:25 AM · Restricted Project

Aug 6 2020

sameerds committed rGc530539badd5: [AArch64][NFC] require aarch64 support for hwasan test (authored by sameerds).
[AArch64][NFC] require aarch64 support for hwasan test
Aug 6 2020, 8:56 PM
sameerds closed D85412: [AArch64][NFC] require aarch64 support for hwasan test.
Aug 6 2020, 8:56 PM · Restricted Project
sameerds added reviewers for D85412: [AArch64][NFC] require aarch64 support for hwasan test: aemerson, arsenm.
Aug 6 2020, 1:52 AM · Restricted Project
sameerds requested review of D85412: [AArch64][NFC] require aarch64 support for hwasan test.
Aug 6 2020, 1:50 AM · Restricted Project

Jul 27 2020

sameerds added inline comments to D84413: [DA][SDA] SyncDependenceAnalysis re-write.
Jul 27 2020, 10:23 PM · Restricted Project

Jul 12 2020

sameerds added a comment to D82087: AMDGPU/clang: Add builtins for llvm.amdgcn.ballot.

https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#warp-vote-and-ballot-functions

I think if the language interface insists on fixing the wave size, then I think the correct solution is to implement this in the header based on a wave size macro (which we're desperately missing). The library implementation should be responsible for inserting the extension to 64-bit for wave32

Jul 12 2020, 12:19 AM

Jul 11 2020

sameerds accepted D83584: [AMDGPU] Move LowerSwitch pass to CodeGenPrepare..

LGTM!

Jul 11 2020, 3:45 AM · Restricted Project

Jul 10 2020

sameerds added a comment to D83584: [AMDGPU] Move LowerSwitch pass to CodeGenPrepare..

From the description: "This patch inserts the Lowerswitch pass in an appropriate place to ensure any dead blocks resulting from the transformation will be removed"

Jul 10 2020, 9:44 PM · Restricted Project
sameerds added inline comments to D83584: [AMDGPU] Move LowerSwitch pass to CodeGenPrepare..
Jul 10 2020, 9:42 PM · Restricted Project
sameerds accepted D83562: [fix-irreducible] Skip unreachable predecessors..

LGTM!

Jul 10 2020, 9:33 PM · Restricted Project
sameerds updated subscribers of D83562: [fix-irreducible] Skip unreachable predecessors..

@cdevadas reported that other parts of the AMDGPU backend are also affected by the unreachable blocks being produced in the switch lowering. Instead of fixing each such pass separately, it seems the best way forward it to put the block elimination earlier in the flow. @cdevadas is already working on such a change.

Jul 10 2020, 9:26 AM · Restricted Project

Jul 9 2020

sameerds added a comment to D83394: [AMDGPU] Avoid splitting FLAT offsets in unsafe ways.

Rebase.
Fix silly mistake in checking for negative offsets.

Jul 9 2020, 7:47 PM · Restricted Project

Jul 8 2020

sameerds added a comment to D82087: AMDGPU/clang: Add builtins for llvm.amdgcn.ballot.

The documentation for HIP __ballot seems to indicate that the user does not have to explicitly specify the warp size. How is that achieved with these new builtins? Can this be captured in a lit test here?

Jul 8 2020, 8:51 PM

Jun 18 2020

sameerds committed rG7aad220795b5: [DA] conservatively mark the join of every divergent branch (authored by sameerds).
[DA] conservatively mark the join of every divergent branch
Jun 18 2020, 5:25 AM
sameerds closed D81806: [DA] conservatively mark the join of every divergent branch.
Jun 18 2020, 5:24 AM · Restricted Project

Jun 17 2020

sameerds retitled D81806: [DA] conservatively mark the join of every divergent branch from [DivergenceAnalysis] mark join of divergent loop exits to [DA] conservatively mark the join of every divergent branch.
Jun 17 2020, 9:02 PM · Restricted Project
sameerds updated the diff for D81806: [DA] conservatively mark the join of every divergent branch.

filed a bug; improved the description; added a failing testcase

Jun 17 2020, 8:31 PM · Restricted Project

Jun 16 2020

sameerds added a comment to D81806: [DA] conservatively mark the join of every divergent branch.

@simoll, I've been staring at the code for some time now, and I am no longer sure that I have a proper grasp on the problem. It seems my fix works, but not for the reasons that I thought it should. Running the unmodified analysis on "join-at-loop-exit.ll" shows the following debug output:

Jun 16 2020, 9:51 PM · Restricted Project
sameerds committed rGd3963b3a5f4f: [DA] propagate loop live-out values that get used in a branch (authored by sameerds).
[DA] propagate loop live-out values that get used in a branch
Jun 16 2020, 9:19 PM
sameerds closed D81822: [DA] propagate loop live-out values that get used in a branch.
Jun 16 2020, 9:19 PM · Restricted Project
sameerds added inline comments to D81806: [DA] conservatively mark the join of every divergent branch.
Jun 16 2020, 7:09 AM · Restricted Project
sameerds added a comment to D81806: [DA] conservatively mark the join of every divergent branch.
bb1: 
   %m.uni = <uni>
   %n.uni = <uni>
   br <uni>, %bb2, %bb3

bb2: [..]
   br <div>, %bb2, %bb3

bb3:
   %x.uni = phi [%m.uni, %bb2], [%n.uni %bb1]  <-- divergent loop exit that is not join divergent

I am new to this party. Why is %x marked as uni? Its value will differ amongst threads as branch in bb2 is divergent. Is my understanding incorrect?

Jun 16 2020, 7:09 AM · Restricted Project
sameerds updated the diff for D81822: [DA] propagate loop live-out values that get used in a branch.

eliminate DeferredTerminators; also a small refactor for readability

Jun 16 2020, 4:58 AM · Restricted Project
sameerds added a comment to D81806: [DA] conservatively mark the join of every divergent branch.

I am a bit surprised this is necessary. Actually, join divergence in bb3 should come about as a result of propagateLoopDivergence for the [bb1, bb2] loop and not immediately because it is a divergent exit.

Jun 16 2020, 1:36 AM · Restricted Project

Jun 15 2020

sameerds added inline comments to D81822: [DA] propagate loop live-out values that get used in a branch.
Jun 15 2020, 5:56 AM · Restricted Project

Jun 14 2020

sameerds created D81822: [DA] propagate loop live-out values that get used in a branch.
Jun 14 2020, 10:23 PM · Restricted Project
sameerds added reviewers for D81822: [DA] propagate loop live-out values that get used in a branch: simoll, nhaehnle, kerbowa, foad.
Jun 14 2020, 10:23 PM · Restricted Project
sameerds added reviewers for D81806: [DA] conservatively mark the join of every divergent branch: simoll, nhaehnle, kerbowa, foad.
Jun 14 2020, 9:04 AM · Restricted Project
sameerds created D81806: [DA] conservatively mark the join of every divergent branch.
Jun 14 2020, 8:32 AM · Restricted Project

Jun 9 2020

sameerds committed rGd8f651d3e8e2: [AMDGPU] Enable structurizer workarounds by default (authored by sameerds).
[AMDGPU] Enable structurizer workarounds by default
Jun 9 2020, 1:04 AM
sameerds closed D81211: [AMDGPU] Enable structurizer workarounds by default.
Jun 9 2020, 1:04 AM · Restricted Project

Jun 4 2020

sameerds added inline comments to D81211: [AMDGPU] Enable structurizer workarounds by default.
Jun 4 2020, 10:01 PM · Restricted Project
sameerds abandoned D78900: [HIP][AMDGPU] Enable structurizer workarounds.

Abandoned in favour of enabling the workarounds in the AMDGPU backend: https://reviews.llvm.org/D81211

Jun 4 2020, 7:51 PM · Restricted Project, Restricted Project
sameerds updated the summary of D81211: [AMDGPU] Enable structurizer workarounds by default.
Jun 4 2020, 7:51 PM · Restricted Project
sameerds created D81211: [AMDGPU] Enable structurizer workarounds by default.
Jun 4 2020, 7:19 PM · Restricted Project

Jun 3 2020

sameerds committed rGb3cff3c72092: Utility to dump .dot representation of SelectionDAG without firing viewer (authored by madhur13490).
Utility to dump .dot representation of SelectionDAG without firing viewer
Jun 3 2020, 11:25 PM
sameerds closed D80711: Utility to dump .dot representation of SelectionDAG without firing viewer.
Jun 3 2020, 11:25 PM · Restricted Project
sameerds added inline comments to D80996: [AMDGPU][OpenMP] Fix duplicate copies of arguments in commands.
Jun 3 2020, 7:49 PM · Restricted Project

May 31 2020

sameerds accepted D79037: [StructurizeCFG] Fix region nodes ordering.

This looks really good. Thanks a lot for bringing it this far! Please do check the #include for LoopInfo.h before submitting.

May 31 2020, 9:50 PM · Restricted Project

May 29 2020

sameerds added inline comments to D79037: [StructurizeCFG] Fix region nodes ordering.
May 29 2020, 10:50 PM · Restricted Project
sameerds accepted D80804: [AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics.

LGTM, but please wait for approval from @arsenm

May 29 2020, 10:50 PM · Restricted Project
sameerds added a comment to D80804: [AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics.

Actually, the question really is about why inc/dec are needed as separate operations either as IR intrinsics or Clang builtins. Why not just expose a __builtin_amdgcn_atomicrmw that takes a scope, and map it to the LLVM atomicrmw? That would be way cleaner. The language can provide convenience functions for inc/dec that internally call the rmw builtin.

At the moment, atomic inc/dec exist along with atomicrmw. This patch only aims to devise a way to make it (inc/dec) accessible from the language level.

May 29 2020, 10:50 PM · Restricted Project
sameerds added a comment to D80804: [AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics.

Actually, the question really is about why inc/dec are needed as separate operations either as IR intrinsics or Clang builtins. Why not just expose a __builtin_amdgcn_atomicrmw that takes a scope, and map it to the LLVM atomicrmw? That would be way cleaner. The language can provide convenience functions for inc/dec that internally call the rmw builtin.

May 29 2020, 10:21 AM · Restricted Project
sameerds added inline comments to D80804: [AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics.
May 29 2020, 9:45 AM · Restricted Project
sameerds added a comment to D80804: [AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics.

The commit description needs fixing. These are not "llvm instructions" they are "AMDGCN intrinsics". They don't exist in the LangRef. Also, I would recommend inverting the first line of the description: "Introduce Clang builtins that are mapped to AMDGCN intrinsics".

May 29 2020, 9:12 AM · Restricted Project
sameerds committed rG0384446c7c24: Remove SVN logic from find_first_existing_vc_file (authored by pdhaliwal).
Remove SVN logic from find_first_existing_vc_file
May 29 2020, 8:08 AM
sameerds committed rG16fef6d0b46f: Fix build failure when source is read only (authored by pdhaliwal).
Fix build failure when source is read only
May 29 2020, 7:35 AM
sameerds closed D79400: [CMAKE] Fix build failure when source directory is read only.
May 29 2020, 7:35 AM · Restricted Project, Restricted Project

May 27 2020

sameerds added inline comments to D79037: [StructurizeCFG] Fix region nodes ordering.
May 27 2020, 10:20 PM · Restricted Project
sameerds accepted D80399: [Local] Prevent `invertCondition` from creating a redundant instruction.

LGTM!

May 27 2020, 1:35 AM · Restricted Project
sameerds accepted D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1.

LGTM!

May 27 2020, 12:30 AM · Restricted Project, Restricted Project

May 26 2020

sameerds added a comment to D79037: [StructurizeCFG] Fix region nodes ordering.

This is a second take of the solution, using SCCs instead of LoopInfo.
This algorithm can also handle irreducible loops, and, generally, any kind of cycle in the graph.

May 26 2020, 11:25 AM · Restricted Project

May 22 2020

sameerds accepted D80091: AMDGPU/GlobalISel: Fix masked control flow with fallthrough blocks.

LGTM!

May 22 2020, 7:28 AM · Restricted Project
sameerds added inline comments to D80399: [Local] Prevent `invertCondition` from creating a redundant instruction.
May 22 2020, 4:45 AM · Restricted Project
sameerds added a comment to D80091: AMDGPU/GlobalISel: Fix masked control flow with fallthrough blocks.

I think the patch is okay, but have a couple of doubts about the tests.

May 22 2020, 3:36 AM · Restricted Project

May 21 2020

sameerds added a comment to D79037: [StructurizeCFG] Fix region nodes ordering.

TBH, I spent a small amount of time visualizing the new testcase, but couldn't figure out the problem, nor the solution. I see that the check for currentLoopSize is moved up the control flow a bit, but I didn't see why.

We need to move up the check for *CurrentLoopSize, as in the case it is zero, then we no longer need to skip nodes, and just continue to the next node/loop.
The simplified, new test case, demonstrate a case where it is needed.

May 21 2020, 10:45 PM · Restricted Project
sameerds added a comment to D80399: [Local] Prevent `invertCondition` from creating a redundant instruction.

I support this enhancement. Thanks for doing it!

May 21 2020, 9:38 PM · Restricted Project

May 20 2020

sameerds accepted D78881: [FlattenCFG] Fix `MergeIfRegion` in case then-path is empty.

I am NOT familiar with this transformation. But I believe that the patch is well-formed, and I was able to follow the logic. The tests also demonstrate that it works as intended. LGTM.

May 20 2020, 11:05 PM · Restricted Project

May 18 2020

sameerds committed rG6c8488436644: [LoopSimplify] don't separate nested loops with convergent calls (authored by sameerds).
[LoopSimplify] don't separate nested loops with convergent calls
May 18 2020, 10:13 PM
sameerds closed D80078: [LoopSimplify] don't separate nested loops with convergent calls.
May 18 2020, 10:13 PM · Restricted Project
sameerds added inline comments to D80078: [LoopSimplify] don't separate nested loops with convergent calls.
May 18 2020, 7:32 PM · Restricted Project
sameerds added inline comments to D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1.
May 18 2020, 8:34 AM · Restricted Project, Restricted Project

May 17 2020

sameerds added inline comments to D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1.
May 17 2020, 10:52 PM · Restricted Project, Restricted Project
sameerds added inline comments to D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1.
May 17 2020, 10:52 PM · Restricted Project, Restricted Project
sameerds added a comment to D79037: [StructurizeCFG] Fix region nodes ordering.

TBH, I spent a small amount of time visualizing the new testcase, but couldn't figure out the problem, nor the solution. I see that the check for currentLoopSize is moved up the control flow a bit, but I didn't see why. A few LLVM_DEBUG lines would be very helpful to trace what the structurizer is doing. Also, can the new test case be reduced further? There are a couple of simple chains, and a few branches. Are all of them necessary? The main problem seems to be an inner loop with just two blocks where both blocks exit. I don't know whether both these properties matter. For example, will the problem happen with a single block that loops on itself?

May 17 2020, 9:16 PM · Restricted Project
sameerds updated the diff for D80078: [LoopSimplify] don't separate nested loops with convergent calls.

Test update: Moved the convergent call to where it is actually relevant.

May 17 2020, 9:02 AM · Restricted Project
sameerds added reviewers for D80078: [LoopSimplify] don't separate nested loops with convergent calls: arsenm, nhaehnle.
May 17 2020, 5:19 AM · Restricted Project
sameerds added a reviewer for D80078: [LoopSimplify] don't separate nested loops with convergent calls: chandlerc.
May 17 2020, 5:19 AM · Restricted Project
sameerds created D80078: [LoopSimplify] don't separate nested loops with convergent calls.
May 17 2020, 4:47 AM · Restricted Project

May 16 2020

sameerds added inline comments to D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1.
May 16 2020, 8:08 PM · Restricted Project, Restricted Project

May 15 2020

sameerds added inline comments to D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1.
May 15 2020, 9:44 AM · Restricted Project, Restricted Project

May 11 2020

sameerds added inline comments to D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1.
May 11 2020, 10:39 PM · Restricted Project, Restricted Project
sameerds added inline comments to D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1.
May 11 2020, 10:07 PM · Restricted Project, Restricted Project
sameerds added inline comments to D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1.
May 11 2020, 7:27 PM · Restricted Project, Restricted Project
sameerds added inline comments to D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1.
May 11 2020, 6:55 PM · Restricted Project, Restricted Project

May 10 2020

sameerds added inline comments to D78900: [HIP][AMDGPU] Enable structurizer workarounds.
May 10 2020, 9:55 PM · Restricted Project, Restricted Project
sameerds accepted D79037: [StructurizeCFG] Fix region nodes ordering.

LGTM. It would be nice if you can move the following tests to the new file and demonstrate that that they are fixed. But that need not hold up this review any longer.

  • test/Transforms/StructurizeCFG/AMDGPU/backedge-id-bug.ll
  • test/Transforms/StructurizeCFG/AMDGPU/backedge-id-bug-xfail.ll
May 10 2020, 7:41 PM · Restricted Project

May 7 2020

sameerds added a comment to D79037: [StructurizeCFG] Fix region nodes ordering.

Actually, I take that back. Consider the following CFG:

A -> H1, H2
H1->B, C
B->H1
C->B, H2
H2->E
E->H2, F

A is the entry and F is the exit. The loops (H1, B, C) and (H2, E) are siblings with backedges B->H1 and E->H2 respectively. C is an exiting block in the first loop that branches to H2. One POT would be "B, F, E, H2, C, H1, A". When iterating in reverse, the loop with H1 is started first, but H2 from the second loop is encountered before the first loop is completed.

In this case, those loops are in different regions, and as this is a region pass, they won't appear in the same POT.

May 7 2020, 5:37 AM · Restricted Project
sameerds added a comment to D79037: [StructurizeCFG] Fix region nodes ordering.

I see. It sure does look like PR25378. Then the only thing left is one new test with two sibling loops. The intention is to cover the case where CurrentLoop is not the parent of L.

I don't think it is possible, although I might be wrong.
Because the way I see it, for loop A be interfered with another loop B, while POT, there must be at least one block that belongs to loop A that is dependent on at least one block in loop B. But then loop B must be finished before loop A, and actually inside it. Otherwise, it won't be recognized as a separate loop.

I hope I explained it clear enough. Otherwise, can you please give an example?

I agree. That means that if L exists, then CurrentLoop must be its parent, right? So instead of checking for this condition, the code will have to assert it.

May 7 2020, 2:48 AM · Restricted Project
sameerds added a comment to D79037: [StructurizeCFG] Fix region nodes ordering.

I see. It sure does look like PR25378. Then the only thing left is one new test with two sibling loops. The intention is to cover the case where CurrentLoop is not the parent of L.

I don't think it is possible, although I might be wrong.
Because the way I see it, for loop A be interfered with another loop B, while POT, there must be at least one block that belongs to loop A that is dependent on at least one block in loop B. But then loop B must be finished before loop A, and actually inside it. Otherwise, it won't be recognized as a separate loop.

I hope I explained it clear enough. Otherwise, can you please give an example?

May 7 2020, 1:41 AM · Restricted Project

May 6 2020

sameerds added a comment to D79037: [StructurizeCFG] Fix region nodes ordering.

The problem with those tests is something else, and not produced by the wrong ordering. You may refer to PR25378 for an example.
Still, this fix the ordering issue and fixes PR41509 (and its duplicates).
I think we should insert this patch, and attend to the other bug as a separate issue.

May 6 2020, 8:35 AM · Restricted Project
sameerds added a comment to D79037: [StructurizeCFG] Fix region nodes ordering.

I have bad news. This fix does not help with the test needs-fr-ule.ll, i.e., the resulting CFG is incorret if I remove the unify-loop-exits pass. The fix-irreducible pass produces an inner loop with the header "irr.guard" and a latch "while.cond47". This is inside an outer loop with header "while.cond" and many backedges. When I run the structurizer with the current fix on this graph, the backedge of the inner loop is lost. Instead we see a spurious loop between two new blocks Flow5 and Flow4, which is governed by incorrect boolean constants when in fact they should have been %Pred variables. That information is lost along the edge Flow8 to Flow5.

May 6 2020, 2:39 AM · Restricted Project

May 5 2020

sameerds added a comment to D79037: [StructurizeCFG] Fix region nodes ordering.

The implementation looks okay to me. The new comments help a lot. Waiting for tests mentioned in the comments so far ... one new test (sibling loops where the DFS jumps from one to the other before finishing the first loop) and updates to existing tests.

May 5 2020, 2:06 AM · Restricted Project

May 3 2020

sameerds added inline comments to D79037: [StructurizeCFG] Fix region nodes ordering.
May 3 2020, 9:18 PM · Restricted Project
sameerds added inline comments to D79037: [StructurizeCFG] Fix region nodes ordering.
May 3 2020, 7:57 AM · Restricted Project