User Details
- User Since
- Apr 10 2015, 3:10 PM (383 w, 19 h)
Mon, Aug 8
Thanks for the fix. LGTM.
Fri, Aug 5
Thu, Aug 4
Is anything else needed for this patch or any additional review comments?
Tue, Aug 2
This is an interesting bug. The patch does fix a flaw with the current Phi generation by the pipeliner. I think the approach here is a good one. It is a little difficult to understand the reason needed for the two different VRMaps though (much of the code in generatePhis is difficult to understand). But, the two maps do reduce some of the complexity in the code, which is good. I'll accept it once the test case comment is resolved. Thanks for fix the issue!
Mon, Aug 1
Fri, Jul 29
A couple more tests are needed. A test for for the image intrinsic. Also an assembler test for the v_illegal encoding.
Wed, Jul 27
Mon, Jul 25
Jul 14 2022
Jun 28 2022
Jun 27 2022
Jun 25 2022
Hi @critson. Thanks for reducing the test case. I don't see an error with the test case though, so I may be missing something. Or, perhaps I need to see the complete test case? Here's my understanding. With the patch, control flow still converges at TrueExit. It's just that the Exit3 and Exit4 blocks have moved in the CFG so that they appear prior to loop.exit.guard rather than after. Following the exit paths from F, G, and C. I think that should be ok?
Jun 24 2022
Hi @critson, thanks for pointing out the issue, and a potential fix. I'm looking at it now.
Jun 22 2022
Jun 19 2022
Updates to the comments in the improve-order.ll test, based up reviewer suggestions.
Jun 16 2022
Thanks for making this change. It's an interesting case. We handled a related issue, with subregisters, by adding code to preprocessPhiNodes to create copies. I don't have a problem with approach in this patch. There may be other cases in this code that need a similar change? Not sure...
Jun 14 2022
The rationale is for library code that looks like:
void example(float *p, float v) {
if (_ISA_verison == 9008 || __ISA_verison == 9010) global_atomic_fadd(p, v); else generic_atomic_fadd(p, v);
}
Jun 13 2022
No changes with this patch. It's been a while since this patch was added. I was investigating an internal test failure, which turns out to be unrelated to this patch. So, I'd like to move forward with reviewing this again.
May 24 2022
May 20 2022
This looks good. Just a couple of minor comments.
Apr 24 2022
Apr 22 2022
Hi @sgundapa, just checking if you made a similar change that wasn't upstreamed?
Apr 14 2022
Made several changes based upon suggestions and to fix a correctness issue:
- The new function, reorderNodes, must appear before collectInfos because collectInfos collects branch predicate information based upon the order of the nodes, so the wrong branch information was used after nodes were reordered.
- Changed name of the function, from improveNodeOrder to reorderNodes because changing the order is an improvement under certain conditions.
- Only reorder the nodes for large regions. The size of the region can be changed with a command-line option. The issues fixed by this patch are seen only with very large regions, so this flag limits the impact of this patch. I'm open to other ways to do this.
- Since reorderNodes appears before collectInfos, it no longer can use Visited. Instead, reorderNodes keeps track of the basic blocks in Order and uses that as part of the criteria to reorder. We only want to reorder a node that has a predecessor in Order.
Apr 11 2022
Apr 9 2022
Messed up the last revision by including another patch. This hopefully fixes it. The only new change is an additional test case, three_loops, added to test/Transforms/UnifyLoopExits/reduce_guards.ll
Added a test to show the case when we need to move the guard block predecessor from an outer loop to an inner loop. Three nested loops are needed for this. Initially, the guard block predecessor is added to the outer loop, but then need to change to the middle loop.
Apr 7 2022
Changes based upon review comments.
Apr 6 2022
It's great to see extensions to the pipeliner so that it works on more targets and loops that don't assume a hardware loop. A little while back a couple of functions, see shoulgnoreForPipelining, were added to support regular loops. However, I don't see that function used. Perhaps that effort was never completely finished? Do we need to get rid of that function, or try to combine the work you've done with that work?
Feb 22 2022
Changes based upon review. Don't just skip over insertelements with a
non-constant index. Instead, fall through to include additional cost
calculation.
Feb 20 2022
Jan 31 2022
There are some clang-format issues to be addressed before committing. Other than that, the patch LGTM. Thanks!
Jan 7 2022
Looks good to me.
Looks good to me. Thanks!
Dec 23 2021
Dec 22 2021
Changes so that the VGPR to AGPR spill slot can be removed if stack slot coloring has not allocated multiple objects to the same slot. Added a couple extra test cases to check for different cases when the slot can/can't be removed.
Dec 20 2021
Dec 18 2021
Oct 13 2021
Sep 21 2021
Sep 20 2021
Add CHECKs for the alias scope metadata. Removed unintentional ws.
Sep 19 2021
Sep 16 2021
Sep 14 2021
Added amdgpu-registered-target to test case.
Aug 18 2021
Jul 30 2021
Moved checks for pointer type in ScaledReg and BaseRegs to outside the loop.
Jun 28 2021
Jun 26 2021
Jun 24 2021
Rebased
Jun 18 2021
Jun 17 2021
Improved description of legal values, based upon review comments.
Jun 14 2021
Jun 13 2021
Addressed review comments. Fixed doxygen comment, and a check for width <= 32. Also, when converting the G_SBFX/G_UBFX to a shift sequence for the sign extend, clarify that the expectation is that the width range is between 1 and 64-Offset.