Working on ROCm at AMD
- User Since
- Oct 17 2012, 11:25 AM (396 w, 6 d)
Fri, May 22
I think the patch is okay, but have a couple of doubts about the tests.
Thu, May 21
I support this enhancement. Thanks for doing it!
Wed, May 20
I am NOT familiar with this transformation. But I believe that the patch is well-formed, and I was able to follow the logic. The tests also demonstrate that it works as intended. LGTM.
Mon, May 18
Sun, May 17
TBH, I spent a small amount of time visualizing the new testcase, but couldn't figure out the problem, nor the solution. I see that the check for currentLoopSize is moved up the control flow a bit, but I didn't see why. A few LLVM_DEBUG lines would be very helpful to trace what the structurizer is doing. Also, can the new test case be reduced further? There are a couple of simple chains, and a few branches. Are all of them necessary? The main problem seems to be an inner loop with just two blocks where both blocks exit. I don't know whether both these properties matter. For example, will the problem happen with a single block that loops on itself?
Test update: Moved the convergent call to where it is actually relevant.
Sat, May 16
Fri, May 15
Mon, May 11
Sun, May 10
LGTM. It would be nice if you can move the following tests to the new file and demonstrate that that they are fixed. But that need not hold up this review any longer.
Thu, May 7
Wed, May 6
I have bad news. This fix does not help with the test needs-fr-ule.ll, i.e., the resulting CFG is incorret if I remove the unify-loop-exits pass. The fix-irreducible pass produces an inner loop with the header "irr.guard" and a latch "while.cond47". This is inside an outer loop with header "while.cond" and many backedges. When I run the structurizer with the current fix on this graph, the backedge of the inner loop is lost. Instead we see a spurious loop between two new blocks Flow5 and Flow4, which is governed by incorrect boolean constants when in fact they should have been %Pred variables. That information is lost along the edge Flow8 to Flow5.
Tue, May 5
The implementation looks okay to me. The new comments help a lot. Waiting for tests mentioned in the comments so far ... one new test (sibling loops where the DFS jumps from one to the other before finishing the first loop) and updates to existing tests.
Sun, May 3
Sat, May 2
It seems the new commit is an incremental update on the previous commit. Can you please submit a single squashed commit, so that the change can be compared against the original state of the trunk?
Wed, Apr 29
This needs more tests. There are a bunch of decisions being taken during the reordering, but the test peresented simply exercises the case of one nested loop inside another. For example, what about two inner loops L1 and L2, where L1 has an exit that reaches the entry of L2? There might be other interesting cases too.
Mon, Apr 27
Apr 26 2020
Apr 23 2020
Thanks @saiislam ... this looks much better!
Apr 22 2020
Can you please submit a squashed single commit for review against the master branch? All the recent commits seem to be relative to previous commits, and I am having trouble looking at the change as a whole.
Apr 21 2020
Added bug ID to the commit description
Apr 15 2020
Apr 14 2020
Apr 8 2020
added an ascii diagram illustrating the transformation
Apr 6 2020
Apr 5 2020
missing clang-format; sorry about that noise!
removed unrelated change; refactored out a dubious nullptr assignment
Apr 4 2020
Apr 2 2020
Apr 1 2020
Mar 30 2020
Mar 29 2020
review comments: reorder loops and merge the predicate logic
Mar 23 2020
require and preserve lower-switch pass
Mar 19 2020
Added a comment about requiring direct branches; removed an empty string from debug output
Mar 17 2020
Mar 16 2020
Here's what I understood what this change is doing, based on just interpreting the names in the code for their English meaning: with this change, it is possible to call a function pointer. You can see a valid swappc in the caller, and the in/out arguments will be setup properly.
Mar 14 2020
Fixed the ascii art example to be correct in terms of SSA. The extra
edge from In1 to Out2 would be nice to have, but it breaks the
pre-condition that Def dominates Use.
Addressed review comments:
- New ascii art.
- Split the large function CreateControlFlowHub into smaller functions.
Mar 11 2020
Mar 10 2020
The commit summary needs improvement. The syntax is not really necessary there, but instead this needs a better explanation of how this builtin fits in with the overall scheme of language-specific and target-specific details of an atomic operation. For example, is this meant only for OpenCL? Does it work with CUDA? Or HIP? What is the behaviour for scope in C++?
Mar 9 2020
Mar 4 2020
addressed review comments
Mar 3 2020
address review comments